Historical corpora meet the digital humanities: the Jerusalem Corpus of Emergent Modern Hebrew

Aynat Rubinstein*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


The paper describes the creation of the first open access multi-genre historical corpus of Emergent Modern Hebrew, made possible by implementation of digital humanities methods in the process of corpus curation, encoding, and dissemination. Corpus contents originate in the Ben-Yehuda Project, an open access repository of Hebrew literature online, and in digital images curated from the collections of the National Library of Israel, a selection of which have been transcribed through a dedicated crowdsourcing task that feeds back into the library’s online catalog. Texts in the corpus are encoded following best practices in the digital humanities, including markup of metadata that enables time-sensitive research, linguistic and other, of the corpus. Evaluation of morphological analysis based on Modern Hebrew language models is shown to distinguish between genres in the historical variety, highlighting the importance of ephemeral materials for linguistic research and for potential collaboration with libraries and cultural institutions in the process of corpus creation. We demonstrate the use of the corpus in diachronic linguistic research and suggest ways in which the association it provides between digital images and texts can be used to support automatic language processing and to enhance resources in the digital humanities.

Original languageAmerican English
Pages (from-to)807-835
Number of pages29
JournalLanguage Resources and Evaluation
Issue number4
StatePublished - 1 Dec 2019

Bibliographical note

Funding Information:
I wish to thank the three anonymous reviewers of this manuscript for their helpful comments. For invaluable discussion and feedback during all stages of the project, I am grateful to Sinai Rusinek. Thanks also to Meni Adler, Maayan Almagor, Yael Netzer, Avigail Tsirkin-Sadan, and Amir Zeldes. This research was supported by the Mandel Scholion Interdisciplinary Research Center in the Humanities and Jewish Studies at the Hebrew University of Jerusalem. I thank researchers at the Center for their support, especially Yael Reshef for enabling me to train research assistants of the “Emergence of Modern Hebrew” research group in the TEI format. Programming support by Itay Zandbank of The Research Software Company (https://www.chelem.co.il) is also gratefully acknowledged.

Publisher Copyright:
© 2019, Springer Nature B.V.


  • Citizen science
  • Crowdsourcing
  • Digital humanities
  • Ephemera
  • Hebrew
  • Historical corpora
  • Language change


Dive into the research topics of 'Historical corpora meet the digital humanities: the Jerusalem Corpus of Emergent Modern Hebrew'. Together they form a unique fingerprint.

Cite this