TY - JOUR
T1 - Latent Watermarking of Audio Generative Models
AU - Roman, Robin San
AU - Fernandez, Pierre
AU - Deleforge, Antoine
AU - Adi, Yossi
AU - Serizel, Romain
N1 - Publisher Copyright:
©2025 IEEE.
PY - 2025
Y1 - 2025
N2 - The advancements in audio generative models have opened up new challenges in their responsible disclosure and the detection of their misuse. To address this, watermarking techniques have been recently developed, enabling the detection of content generated by a deployed model. For such techniques to be useful, the watermark must resist typical modifications applied to the model or its outputs. The use case of an open-source model trained on proprietary data is challenging, as post-hoc watermarks can then be trivially removed. In response, we introduce a method that watermarks latent audio generative models by directly watermarking their training data. We show the method to be robust against a broad range of audio edits including filtering, compression or even to changing the model’s decoder, maintaining high detection rates with very few false positives. Interestingly, we show that even fine-tuning the model on another dataset can only significantly lower the detection rate at the cost of degrading the generation performance near the level of re-training the model without the protected training data.
AB - The advancements in audio generative models have opened up new challenges in their responsible disclosure and the detection of their misuse. To address this, watermarking techniques have been recently developed, enabling the detection of content generated by a deployed model. For such techniques to be useful, the watermark must resist typical modifications applied to the model or its outputs. The use case of an open-source model trained on proprietary data is challenging, as post-hoc watermarks can then be trivially removed. In response, we introduce a method that watermarks latent audio generative models by directly watermarking their training data. We show the method to be robust against a broad range of audio edits including filtering, compression or even to changing the model’s decoder, maintaining high detection rates with very few false positives. Interestingly, we show that even fine-tuning the model on another dataset can only significantly lower the detection rate at the cost of degrading the generation performance near the level of re-training the model without the protected training data.
KW - audio
KW - generative models
KW - watermarking
UR - https://www.scopus.com/pages/publications/105009603171
U2 - 10.1109/icassp49660.2025.10889782
DO - 10.1109/icassp49660.2025.10889782
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???
AN - SCOPUS:105009603171
SN - 1520-6149
JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
T2 - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Y2 - 6 April 2025 through 11 April 2025
ER -