Optical Character Recognition and Parsing of Typeset Mathematics

Richard J. Fateman*, Taku Tokuyasu, Benjamin P. Berman, Nicholas Mitchell

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

58 Scopus citations

Abstract

There is a wealth of mathematical knowledge that could be potentially very useful in many computational applications, but is not available in electronic form. This knowledge comes in the form of mechanically typeset books and journals going back more than 100 years. Besides these older sources, there are a great many current publications, filled with useful mathematical information, which are difficult if not impossible to obtain in electronic form. Our work intends to encode, for use by computer algebra systems, integral tables and other documents currently available in hardcopy only. Our strategy is to extract character information from these documents, which is then passed to higher-level parsing routines for further extraction of mathematical content (or any other useful two-dimensional semantic content). This information can then be output as, for example, a Lisp or TEX expression. We have also developed routines for rapid access to this information, specifically for finding matches with formulas in a table of integrals. This paper reviews our current efforts and summarizes our results and the problems we have encountered.

Original languageEnglish
Pages (from-to)2-15
Number of pages14
JournalJournal of Visual Communication and Image Representation
Volume7
Issue number1
DOIs
StatePublished - Mar 1996
Externally publishedYes

Bibliographical note

Funding Information:
1This work was supported in part by NSF Grants CCR-9214963 and IRI-9411334 and by NSF Infrastructure Grant CDA-8722788.

Fingerprint

Dive into the research topics of 'Optical Character Recognition and Parsing of Typeset Mathematics'. Together they form a unique fingerprint.

Cite this