Abstract
Multilingual models have been widely used for cross-lingual transfer to low-resource languages. However, the performance on these languages is hindered by their underrepresentation in the pretraining data. To alleviate this problem, we propose a novel multilingual training technique based on teacherstudent knowledge distillation. In this setting, we utilize monolingual teacher models optimized for their language. We use those teachers along with balanced (sub-sampled) data to distill the teachers knowledge into a single multilingual student. Our method outperforms standard training methods in lowresource languages and retains performance on high-resource languages.
Original language | English |
---|---|
Title of host publication | SIGTYP 2023 - 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, Proceedings of the Workshop |
Editors | Lisa Beinborn, Koustava Goswami, Saliha Muradoglu, Alexey Sorokin, Ritesh Kumar, Andreas Shcherbakov, Edoardo M. Ponti, Ryan Cotterell, Ekaterina Vylomova |
Publisher | Association for Computational Linguistics |
Pages | 1-11 |
Number of pages | 11 |
ISBN (Electronic) | 9781959429562 |
State | Published - 2023 |
Event | 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, SIGTYP 2023, co-located with the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Hybrid, Dubrovnik, Croatia Duration: 6 May 2023 → … |
Publication series
Name | SIGTYP 2023 - 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, Proceedings of the Workshop |
---|
Conference
Conference | 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, SIGTYP 2023, co-located with the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 |
---|---|
Country/Territory | Croatia |
City | Hybrid, Dubrovnik |
Period | 6/05/23 → … |
Bibliographical note
Publisher Copyright:© 2023 Association for Computational Linguistics.