TY - JOUR
T1 - On The Landscape of Spoken Language Models
T2 - A Comprehensive Survey
AU - Arora, Siddhant
AU - Chang, Kai Wei
AU - Chien, Chung Ming
AU - Peng, Yifan
AU - Wu, Haibin
AU - Adi, Yossi
AU - Dupoux, Emmanuel
AU - Lee, Hung Yi
AU - Livescu, Karen
AU - Watanabe, Shinji
N1 - Publisher Copyright:
© 2025, Transactions on Machine Learning Research. All rights reserved.
PY - 2025
Y1 - 2025
N2 - The field of spoken language processing is undergoing a shift from training custom-built, task-specific models toward using and optimizing spoken language models (SLMs) which act as universal speech processing systems. This trend is similar to the progression toward universal language models that has taken place in the field of (text) natural language processing. SLMs include both “pure” language models of speech—models of the distribution of tokenized speech sequences—and models that combine speech encoders with text language models, often including both spoken and written input or output. Work in this area is very diverse, with a range of terminology and evaluation settings. This paper aims to contribute an improved understanding of SLMs via a unifying literature survey of recent work in the context of the evolution of the field. Our survey categorizes the work in this area by model architecture, training, and evaluation choices, and describes some key challenges and directions for future work.
AB - The field of spoken language processing is undergoing a shift from training custom-built, task-specific models toward using and optimizing spoken language models (SLMs) which act as universal speech processing systems. This trend is similar to the progression toward universal language models that has taken place in the field of (text) natural language processing. SLMs include both “pure” language models of speech—models of the distribution of tokenized speech sequences—and models that combine speech encoders with text language models, often including both spoken and written input or output. Work in this area is very diverse, with a range of terminology and evaluation settings. This paper aims to contribute an improved understanding of SLMs via a unifying literature survey of recent work in the context of the evolution of the field. Our survey categorizes the work in this area by model architecture, training, and evaluation choices, and describes some key challenges and directions for future work.
UR - https://www.scopus.com/pages/publications/105026955633
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:105026955633
SN - 2835-8856
VL - October-2025
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -