TY - GEN
T1 - A scalable and effective full-text search in P2P networks
AU - Mass, Yosi
AU - Sagiv, Yehoshua
AU - Shmueli-Scheuer, Michal
PY - 2009
Y1 - 2009
N2 - We consider the problem of full-text search involving multi-term queries in a network of self-organizing, autonomous peers. Existing approaches do not scale well with respect to the number of peers, because they either require access to a large number of peers or incur a high communication cost in order to achieve good query results. In this paper, we present a novel algorithmic framework for processing multi-term queries in P2P networks that achieves high recall while using (per-query) a small number of peers and a low communication cost, thereby enabling high query throughput. Our approach is based on per-query peer-selection strategy using two-dimensional histograms of score distributions. A full utilization of the histograms incurs a high communication cost. We show how to drastically reduce this cost by employing a two-phase peer-selection algorithm. We also describe an adaptive approach to peer selection that further increases the recall. Experiments on a large real-world collection show that the recall is indeed high while the number of involved peers and the communication cost are low.
AB - We consider the problem of full-text search involving multi-term queries in a network of self-organizing, autonomous peers. Existing approaches do not scale well with respect to the number of peers, because they either require access to a large number of peers or incur a high communication cost in order to achieve good query results. In this paper, we present a novel algorithmic framework for processing multi-term queries in P2P networks that achieves high recall while using (per-query) a small number of peers and a low communication cost, thereby enabling high query throughput. Our approach is based on per-query peer-selection strategy using two-dimensional histograms of score distributions. A full utilization of the histograms incurs a high communication cost. We show how to drastically reduce this cost by employing a two-phase peer-selection algorithm. We also describe an adaptive approach to peer selection that further increases the recall. Experiments on a large real-world collection show that the recall is indeed high while the number of involved peers and the communication cost are low.
KW - Clustering
KW - DHT
KW - Histograms
KW - P2P search
UR - http://www.scopus.com/inward/record.url?scp=74549197627&partnerID=8YFLogxK
U2 - 10.1145/1645953.1646281
DO - 10.1145/1645953.1646281
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:74549197627
SN - 9781605585123
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1979
EP - 1982
BT - ACM 18th International Conference on Information and Knowledge Management, CIKM 2009
T2 - ACM 18th International Conference on Information and Knowledge Management, CIKM 2009
Y2 - 2 November 2009 through 6 November 2009
ER -