There is a wealth of mathematical knowledge that could be potentially very useful in many computational applications, but is not available in electronic form. This knowledge comes in the form of mechanically typeset books and journals going back more than 100 years. Besides these older sources, there are a great many current publications, filled with useful mathematical information, which are difficult if not impossible to obtain in electronic form. Our work intends to encode, for use by computer algebra systems, integral tables and other documents currently available in hardcopy only. Our strategy is to extract character information from these documents, which is then passed to higher-level parsing routines for further extraction of mathematical content (or any other useful two-dimensional semantic content). This information can then be output as, for example, a Lisp or TEX expression. We have also developed routines for rapid access to this information, specifically for finding matches with formulas in a table of integrals. This paper reviews our current efforts and summarizes our results and the problems we have encountered.
|Original language||American English|
|Number of pages||14|
|Journal||Journal of Visual Communication and Image Representation|
|State||Published - Mar 1996|
Bibliographical noteFunding Information:
1This work was supported in part by NSF Grants CCR-9214963 and IRI-9411334 and by NSF Infrastructure Grant CDA-8722788.