Abstract
Distinguishing between singular and plural "you" in English is a challenging task which has potential for downstream applications, such as machine translation or coreference resolution. While formal written English does not distinguish between these cases, other languages (such as Spanish), as well as other dialects of English (via phrases such as "y'all"), do make this distinction. We make use of this to obtain distantly-supervised labels for the task on a large-scale in two domains. Following, we train a model to distinguish between the single/plural 'you', finding that although in-domain training achieves reasonable accuracy (≥ 77%), there is still a lot of room for improvement, especially in the domain-transfer scenario, which proves extremely challenging. Our code and data are publicly available.1.
Original language | English |
---|---|
Title of host publication | W-NUT@EMNLP 2019 - 5th Workshop on Noisy User-Generated Text, Proceedings |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 375-380 |
Number of pages | 6 |
ISBN (Electronic) | 9781950737840 |
State | Published - 2019 |
Externally published | Yes |
Event | 5th Workshop on Noisy User-Generated Text, W-NUT@EMNLP 2019 - Hong Kong, China Duration: 4 Nov 2019 → … |
Publication series
Name | W-NUT@EMNLP 2019 - 5th Workshop on Noisy User-Generated Text, Proceedings |
---|
Conference
Conference | 5th Workshop on Noisy User-Generated Text, W-NUT@EMNLP 2019 |
---|---|
Country/Territory | China |
City | Hong Kong |
Period | 4/11/19 → … |
Bibliographical note
Publisher Copyright:© 2019 Association for Computational Linguistics