Abstract
Most of the text in a computer program is composed of the names of variables and functions. These names are selected by one developer, and need to be understood by others. This is similar to the role of words written in natural language. But there are several marked differences between the names in a program and the words in a book. First, names are frequently composed of multiple existing words, in an attempt to capture nuanced meanings and intents. Second, because of the use of multiple words, names can be rather long. Third, conventions may also allow names to be very short, and many single-letter names are used. But despite these differences, the general statistics of names are rather similar to the statistics of words. Like words, the distribution of names is close to a Zipf distribution. Also, popular names tend to be shorter than rarely used names. However, the underlying vocabulary if different. The composition of words leads to a more diverse vocabulary that can grow without bounds. But if we look at the individual words used in compound names, we find a rather limited vocabulary. These properties help explain the predictability of software, and how it can coincide with the large variability of names. It also suggests that it may be beneficial to model programs at the level of individual words rather than at the level of source code tokens.
Original language | English |
---|---|
Title of host publication | Proceedings - 2022 29th Asia-Pacific Software Engineering Conference, APSEC 2022 |
Publisher | IEEE Computer Society |
Pages | 21-30 |
Number of pages | 10 |
ISBN (Electronic) | 9781665455374 |
DOIs | |
State | Published - 2022 |
Event | 29th Asia-Pacific Software Engineering Conference, APSEC 2022 - Virtual, Online, Japan Duration: 6 Dec 2022 → 9 Dec 2022 |
Publication series
Name | Proceedings - Asia-Pacific Software Engineering Conference, APSEC |
---|---|
Volume | 2022-December |
ISSN (Print) | 1530-1362 |
Conference
Conference | 29th Asia-Pacific Software Engineering Conference, APSEC 2022 |
---|---|
Country/Territory | Japan |
City | Virtual, Online |
Period | 6/12/22 → 9/12/22 |
Bibliographical note
Publisher Copyright:© 2022 IEEE.
Keywords
- program lexicon
- variable name
- words distribution