Abstract
Building on recent advances in semantic parsing and text simplification, we investigate the use of semantic splitting of the source sentence as preprocessing for machine translation. We experiment with a Transformer model and evaluate using large-scale crowd-sourcing experiments. Results show a significant increase in fluency on long sentences on an English-to- French setting with a training corpus of 5M sentence pairs, while retaining comparable adequacy. We also perform a manual analysis which explores the tradeoff between adequacy and fluency in the case where all sentence lengths are considered.
Original language | English |
---|---|
Title of host publication | Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics |
Editors | Iryna Gurevych, Marianna Apidianaki, Manaal Faruqui |
Place of Publication | Barcelona, Spain (Online) |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 50-57 |
Number of pages | 8 |
ISBN (Electronic) | 978-1-952148-32-3 |
State | Published - Dec 2020 |
Event | 9th Joint Conference on Lexical and Computational Semantics - Barcelona, Spain (Online), Barcelona, Spain Duration: 12 Nov 2020 → 13 Nov 2020 Conference number: 9 https://aclanthology.org/volumes/2020.starsem-1/ |
Conference
Conference | 9th Joint Conference on Lexical and Computational Semantics |
---|---|
Country/Territory | Spain |
City | Barcelona |
Period | 12/11/20 → 13/11/20 |
Internet address |
Keywords
- semantic parsing
- text simplification
- semantic splitting
- crowd-sourcing
- Neural Machine Translation
- Lexical Semantics
- Computational Semantics