Abstract
We present a method for exploring regions around individual points in a contextualized vector space (particularly, BERT space), as a way to investigate how these regions correspond to word senses. By inducing a contextualized “pseudoword” as a stand-in for a static embedding in the input layer, and then performing masked prediction of a word in the sentence, we are able to investigate the geometry of the BERT-space in a controlled manner around individual instances. Using our method on a set of carefully constructed sentences targeting ambiguous English words, we find substantial regularity in the contextualized space, with regions that correspond to distinct word senses; but between these regions there are occasionally “sense voids”-regions that do not correspond to any intelligible sense.
| Original language | English |
|---|---|
| Title of host publication | EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 10300-10313 |
| Number of pages | 14 |
| ISBN (Electronic) | 9781955917094 |
| DOIs | |
| State | Published - 2021 |
| Event | 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Hybrid, Punta Cana, Dominican Republic Duration: 7 Nov 2021 → 11 Nov 2021 |
Publication series
| Name | EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings |
|---|
Conference
| Conference | 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 |
|---|---|
| Country/Territory | Dominican Republic |
| City | Hybrid, Punta Cana |
| Period | 7/11/21 → 11/11/21 |
Bibliographical note
Publisher Copyright:© 2021 Association for Computational Linguistics