Hierarchical indexing and document matching in BoW

Maayan Geffet*, Dror G. Feitelson

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

BoW is an on-line bibliographical repository based on a hierarchical concept index to which entries are linked. Searching in the repository should therefore return matching topics from the hierarchy, rather than just a list of entries. Likewise, when new entries are inserted, a search for relevant topics to which they should be linked is required. We develop a vector-based algorithm that creates keyword vectors for the set of competing topics at each node in the hierarchy, and show how its performance improves when domain-specific features are added (such as special handling of topic titles and author names). The results of a 7-fold cross validation on a corpus of some 3,500 entries with a 5-level index are hit ratios in the range of 89-95%, and most of the misclassifications are indeed ambiguous to begin with.

Original languageEnglish
Title of host publicationProceedings of First ACM/IEEE-CS Joint Conference on Digital Libraries
Pages259-267
Number of pages9
StatePublished - 2001
EventProceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries - Roanoke, VA, United States
Duration: 24 Jun 200128 Jun 2001

Publication series

NameProceedings of First ACM/IEEE-CS Joint Conference on Digital Libraries

Conference

ConferenceProceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries
Country/TerritoryUnited States
CityRoanoke, VA
Period24/06/0128/06/01

Fingerprint

Dive into the research topics of 'Hierarchical indexing and document matching in BoW'. Together they form a unique fingerprint.

Cite this