TY - JOUR
T1 - PHIRST
T2 - A distributed architecture for P2P information retrieval
AU - Rosenfeld, Avi
AU - Goldman, Claudia V.
AU - Kaminka, Gal A.
AU - Kraus, Sarit
PY - 2009/4
Y1 - 2009/4
N2 - Recent progress in peer to peer (P2P) search algorithms has presented viable structured and unstructured approaches for full-text search. We posit that these existing approaches are each best suited for different types of queries. We present PHIRST, the first system to facilitate effective full-text search within P2P databases. PHIRST works by effectively leveraging between the relative strengths of these approaches. Similar to structured approaches, agents first publish terms within their stored documents. However, frequent terms are quickly identified and not exhaustively stored, resulting in a significant reduction in the system's storage requirements. During query lookup, agents use unstructured search to compensate for the lack of fully published terms. Additionally, they explicitly weigh between the costs involved in structured and unstructured approaches, allowing for a significant reduction in query costs. Finally, we address how node failures can be effectively addressed through storing multiple copies of selected data. We evaluated the effectiveness of our approach using both real-world and artificial queries. We found that in most situations our approach yields near perfect recall. We discuss the limitations of our system, as well as possible compensatory strategies.
AB - Recent progress in peer to peer (P2P) search algorithms has presented viable structured and unstructured approaches for full-text search. We posit that these existing approaches are each best suited for different types of queries. We present PHIRST, the first system to facilitate effective full-text search within P2P databases. PHIRST works by effectively leveraging between the relative strengths of these approaches. Similar to structured approaches, agents first publish terms within their stored documents. However, frequent terms are quickly identified and not exhaustively stored, resulting in a significant reduction in the system's storage requirements. During query lookup, agents use unstructured search to compensate for the lack of fully published terms. Additionally, they explicitly weigh between the costs involved in structured and unstructured approaches, allowing for a significant reduction in query costs. Finally, we address how node failures can be effectively addressed through storing multiple copies of selected data. We evaluated the effectiveness of our approach using both real-world and artificial queries. We found that in most situations our approach yields near perfect recall. We discuss the limitations of our system, as well as possible compensatory strategies.
KW - Distributed databases
KW - Information retrieval
KW - Peer to peer systems
UR - https://www.scopus.com/pages/publications/56949104175
U2 - 10.1016/j.is.2008.08.002
DO - 10.1016/j.is.2008.08.002
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:56949104175
SN - 0306-4379
VL - 34
SP - 290
EP - 303
JO - Information Systems
JF - Information Systems
IS - 2
ER -