Language models for keyword search over data graphs

Yosi Mass*, Yehoshua Sagiv

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

In keyword search over data graphs, an answer is a nonredundant subtree that includes the given keywords. This paper focuses on improving the effectiveness of that type of search. A novel approach that combines language models with structural relevance is described. The proposed approach consists of three steps. First, language models are used to assign dynamic, query-dependent weights to the graph. Those weights complement static weights that are pre-assigned to the graph. Second, an existing algorithm returns candidate answers based on their weights. Third, the candidate answers are re-ranked by creating a language model for each one. The effectiveness of the proposed approach is verified on a benchmark of three datasets: IMDB, Wikipedia and Mondial. The proposed approach outperforms all existing systems on the three datasets, which is a testament to its robustness. It is also shown that the effectiveness can be further improved by augmenting keyword queries with very basic knowledge about the structure.

Original languageEnglish
Title of host publicationWSDM 2012 - Proceedings of the 5th ACM International Conference on Web Search and Data Mining
Pages363-372
Number of pages10
DOIs
StatePublished - 2012
Event5th ACM International Conference on Web Search and Data Mining, WSDM 2012 - Seattle, WA, United States
Duration: 8 Feb 201212 Feb 2012

Publication series

NameWSDM 2012 - Proceedings of the 5th ACM International Conference on Web Search and Data Mining

Conference

Conference5th ACM International Conference on Web Search and Data Mining, WSDM 2012
Country/TerritoryUnited States
CitySeattle, WA
Period8/02/1212/02/12

Keywords

  • Data graphs
  • Language models
  • Ranking
  • Semantic weights

Fingerprint

Dive into the research topics of 'Language models for keyword search over data graphs'. Together they form a unique fingerprint.

Cite this