A Language Modeling Approach to Diacritic-Free Hebrew TTS

Amit Roth, Arnon Turetzky, Yossi Adi

Research output: Contribution to journalConference articlepeer-review

Abstract

We tackle the task of text-to-speech (TTS) in Hebrew. Traditional Hebrew contains Diacritics, which dictate the way individuals should pronounce given words, however, modern Hebrew rarely uses them. The lack of diacritics in modern Hebrew results in readers expected to conclude the correct pronunciation and understand which phonemes to use based on the context. This imposes a fundamental challenge on TTS systems to accurately map between text-to-speech. In this work, we propose to adopt a language modeling Diacritics-Free approach, for the task of Hebrew TTS. The model operates on discrete speech representations and is conditioned on a word-piece tokenizer. We optimize the proposed method using in-the-wild weakly supervised data and compare it to several diacritic-based TTS systems. Results suggest the proposed method is superior to the evaluated baselines considering both content preservation and naturalness of the generated speech. Samples can be found under the following link: pages.cs.huji.ac.il/adiyoss-lab/HebTTS/.

Original languageEnglish
Pages (from-to)2775-2779
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
StatePublished - 2024
Event25th Interspeech Conferece 2024 - Kos Island, Greece
Duration: 1 Sep 20245 Sep 2024

Bibliographical note

Publisher Copyright:
© 2024 International Speech Communication Association. All rights reserved.

Keywords

  • Diacritic
  • Hebrew speech
  • Text-to-Speech

Fingerprint

Dive into the research topics of 'A Language Modeling Approach to Diacritic-Free Hebrew TTS'. Together they form a unique fingerprint.

Cite this