We experimented with several different ways to encode the MIDI files into tokens suitable for this task. The transformer is trained on sequential data: given a set of notes, we ask it to predict the upcoming note. Additionally, we used the MAESTRO dataset. ClassicalArchives and BitMidi donated their large collections of MIDI files for this project, and we also found several collections online, including jazz, pop, African, Indian, and Arabic styles.
We collected training data for MuseNet from many different sources.