TY - JOUR
T1 - AUDIO CONDITIONING FOR MUSIC GENERATION VIA DISCRETE BOTTLENECK FEATURES
AU - Rouard, Simon
AU - Adi, Yossi
AU - Copet, Jade
AU - Roebel, Axel
AU - Défossez, Alexandre
N1 - Publisher Copyright:
© S. Rouard, Y. Adi, J. Copet, A. Roebel, A. Défossez.
PY - 2024
Y1 - 2024
N2 - While most music generation models use textual or para-metric conditioning (e.g. tempo, harmony, musical genre), we propose to condition a language model based music generation system with audio input. Our exploration in-volves two distinct strategies. The first strategy, termed textual inversion, leverages a pre-trained text-to-music model to map audio input to corresponding "pseudowords" in the textual embedding space. For the second model we train a music language model from scratch jointly with a text conditioner and a quantized audio feature extractor. At inference time, we can mix textual and audio conditioning and balance them thanks to a novel double classifier free guidance method. We conduct automatic and human studies that validates our approach. We will release the code and we provide music samples on musicgenstyle.github.io in order to show the quality of our model.
AB - While most music generation models use textual or para-metric conditioning (e.g. tempo, harmony, musical genre), we propose to condition a language model based music generation system with audio input. Our exploration in-volves two distinct strategies. The first strategy, termed textual inversion, leverages a pre-trained text-to-music model to map audio input to corresponding "pseudowords" in the textual embedding space. For the second model we train a music language model from scratch jointly with a text conditioner and a quantized audio feature extractor. At inference time, we can mix textual and audio conditioning and balance them thanks to a novel double classifier free guidance method. We conduct automatic and human studies that validates our approach. We will release the code and we provide music samples on musicgenstyle.github.io in order to show the quality of our model.
UR - http://www.scopus.com/inward/record.url?scp=85215463580&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85215463580
SN - 3006-3094
VL - 2024
SP - 146
EP - 153
JO - Proceedings of the International Society for Music Information Retrieval Conference
JF - Proceedings of the International Society for Music Information Retrieval Conference
ER -