Abstract
Speech language models have recently demonstrated great potential as universal speech processing systems. Such models have the ability to model the rich acoustic information existing in audio signals, beyond spoken content, such as emotion, background noise, etc. Despite this, evaluation benchmarks which evaluate awareness to a wide range of acoustic aspects, are lacking. To help bridge this gap, we introduce SALMON, a novel evaluation suite encompassing background noise, sentiment, speaker identity and room impulse response. The proposed benchmarks both evaluate the consistency of the inspected element and how much it matches the spoken text. We follow a modelling based approach, measuring whether a model gives correct samples higher scores than incorrect ones. This approach makes the benchmark fast to compute even for large models. We evaluated several speech language models on SALMON, thus highlighting the strengths and weaknesses of each evaluated method. We make the code and data publicly available at - pages.cs.huji.ac.il/adiyoss-lab/salmon/.
Original language | English |
---|---|
Title of host publication | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings |
Editors | Bhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9798350368741 |
DOIs | |
State | Published - 2025 |
Event | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India Duration: 6 Apr 2025 → 11 Apr 2025 |
Publication series
Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
---|---|
ISSN (Print) | 1520-6149 |
Conference
Conference | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 |
---|---|
Country/Territory | India |
City | Hyderabad |
Period | 6/04/25 → 11/04/25 |
Bibliographical note
Publisher Copyright:© 2025 IEEE.
Keywords
- Acoustic Modelling
- Speech Language Models