Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling

Amitay Sicherman*, Yossi Adi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Scopus citations

Abstract

This work profoundly analyzes discrete self-supervised speech representations (units) through the eyes of Generative Spoken Language Modeling (GSLM). Following the findings of such an analysis, we propose practical improvements to the discrete unit for the GSLM. First, we start comprehending these units by analyzing them in three axes: interpretation, visualization, and resynthesis. Our analysis finds a high correlation between the speech units to phonemes and phoneme families, while their correlation with speaker or gender is weaker. Additionally, we found redundancies in the extracted units and claim that one reason may be the units' context. Following this analysis, we propose a new, unsupervised metric to measure unit redundancies. Finally, we use this metric to develop new methods that improve the robustness of units' clustering and show significant improvement considering zero-resource speech metrics such as ABX. Code and analysis tools are available under the following link.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728163277
DOIs
StatePublished - 2023
Event48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Country/TerritoryGreece
CityRhodes Island
Period4/06/2310/06/23

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Keywords

  • generative spoken language modeling
  • self supervised learning
  • speech LM
  • textless NLP

Fingerprint

Dive into the research topics of 'Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling'. Together they form a unique fingerprint.

Cite this