Hierarchical indexing and document matching in BoW

Maayan Geffet, Dror G. Feitelson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

BoW is an on-line bibliographical repository based on a hierarchical concept index to which entries are linked. Searching in the repository should therefore return matching topics from the hierarchy, rather than just a list of entries. Likewise, when new entries are inserted, a search for relevant topics to which they should be linked is required. We develop a vector-based algorithm that creates keyword vectors for the set of competing topics at each node in the hierarchy, and show how its performance improves when domainspecific features are added (such as special handling of topic titles and author names). The results of a 7-fold cross validation on a corpus of some 3,500 entries with a 5-level index are hit ratios in the range of 89-95%, and most of the misclassifications are indeed ambiguous to begin with.

Original languageEnglish
Title of host publicationProceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2001
PublisherAssociation for Computing Machinery
Pages259-267
Number of pages9
ISBN (Print)1581133456, 9781581133455
DOIs
StatePublished - 2001
Event1st ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2001 - Roanoke, VA, United States
Duration: 24 Jun 200128 Jun 2001

Publication series

NameProceedings of the ACM International Conference on Digital Libraries

Conference

Conference1st ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2001
Country/TerritoryUnited States
CityRoanoke, VA
Period24/06/0128/06/01

Fingerprint

Dive into the research topics of 'Hierarchical indexing and document matching in BoW'. Together they form a unique fingerprint.

Cite this