TY - GEN
T1 - KMV-peer
T2 - 4th ACM International Conference on Web Search and Data Mining, WSDM 2011
AU - Mass, Yosi
AU - Sagiv, Yehoshua
AU - Shmueli-Scheuer, Michal
PY - 2011
Y1 - 2011
N2 - The problem of fully decentralized search over many collections is considered. The objective is to approximate the results of centralized search (namely, using a central index) while controlling the communication cost and involving only a small number of collections. The proposed solution is couched in a peer-to-peer (P2P) network, but can also be applied in other setups. Peers publish per-term summaries of their collections. Specifically, for each term, the range of document scores is divided into intervals; and for each interval, a KMV (K Minimal Values) synopsis of its documents is created. A new peer-selection algorithm uses the KMV synopses and two scoring functions in order to adaptively rank the peers, according to the relevance of their documents to a given query. The proposed method achieves high-quality results while meeting the above criteria of efficiency. In particular, experiments are done on two large, real-world datasets; one is blogs and the other is web data. These experiments show that the algorithm outperforms the state-of-the-art approaches and is robust over different collections, various scoring functions and multi-term queries.
AB - The problem of fully decentralized search over many collections is considered. The objective is to approximate the results of centralized search (namely, using a central index) while controlling the communication cost and involving only a small number of collections. The proposed solution is couched in a peer-to-peer (P2P) network, but can also be applied in other setups. Peers publish per-term summaries of their collections. Specifically, for each term, the range of document scores is divided into intervals; and for each interval, a KMV (K Minimal Values) synopsis of its documents is created. A new peer-selection algorithm uses the KMV synopses and two scoring functions in order to adaptively rank the peers, according to the relevance of their documents to a given query. The proposed method achieves high-quality results while meeting the above criteria of efficiency. In particular, experiments are done on two large, real-world datasets; one is blogs and the other is web data. These experiments show that the algorithm outperforms the state-of-the-art approaches and is robust over different collections, various scoring functions and multi-term queries.
KW - Algorithms
KW - Experimentation
KW - Performance
UR - http://www.scopus.com/inward/record.url?scp=79952406890&partnerID=8YFLogxK
U2 - 10.1145/1935826.1935860
DO - 10.1145/1935826.1935860
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:79952406890
SN - 9781450304931
T3 - Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011
SP - 157
EP - 166
BT - Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011
Y2 - 9 February 2011 through 12 February 2011
ER -