TY - GEN
T1 - Keyword proximity search in complex data graphs
AU - Golenberg, Konstantin
AU - Kimelfeld, Benny
AU - Sagiv, Yehoshua
PY - 2008
Y1 - 2008
N2 - In keyword search over data graphs, an answer is a non-redundant subtree that includes the given keywords. An algorithm for enumerating answers is presented within an architecture that has two main components: an engine that generates a set of candidate answers and a ranker that evaluates their score. To be effective, the engine must have three fundamental properties. It should not miss relevant answers, has to be efficient and must generate the answers in an order that is highly correlated with the desired ranking. It is shown that none of the existing systems has implemented an engine that has all of these properties. In contrast, this paper presents an engine that generates all the answers with provable guarantees. Experiments show that the engine performs well in practice. It is also shown how to adapt this engine to queries under the OR semantics. In addition, this paper presents a novel approach for implementing rankers destined for eliminating redundancy. Essentially, an answer is ranked according to its individual properties (relevancy) and its intersection with the answers that have already been presented to the user. Within this approach, experiments with specific rankers are described.
AB - In keyword search over data graphs, an answer is a non-redundant subtree that includes the given keywords. An algorithm for enumerating answers is presented within an architecture that has two main components: an engine that generates a set of candidate answers and a ranker that evaluates their score. To be effective, the engine must have three fundamental properties. It should not miss relevant answers, has to be efficient and must generate the answers in an order that is highly correlated with the desired ranking. It is shown that none of the existing systems has implemented an engine that has all of these properties. In contrast, this paper presents an engine that generates all the answers with provable guarantees. Experiments show that the engine performs well in practice. It is also shown how to adapt this engine to queries under the OR semantics. In addition, this paper presents a novel approach for implementing rankers destined for eliminating redundancy. Essentially, an answer is ranked according to its individual properties (relevancy) and its intersection with the answers that have already been presented to the user. Within this approach, experiments with specific rankers are described.
KW - Approximate top-k answers
KW - Information retrieval on graphs
KW - Keyword proximity search
KW - Redundancy elimination
KW - Subtree enumeration by height
UR - http://www.scopus.com/inward/record.url?scp=57149144917&partnerID=8YFLogxK
U2 - 10.1145/1376616.1376708
DO - 10.1145/1376616.1376708
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:57149144917
SN - 9781605581026
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 927
EP - 940
BT - SIGMOD 2008
T2 - 2008 ACM SIGMOD International Conference on Management of Data 2008, SIGMOD'08
Y2 - 9 June 2008 through 12 June 2008
ER -