TY - JOUR
T1 - Location-based algorithms for finding sets of corresponding objects over several geo-spatial data sets
AU - Safra, Eliyahu
AU - Kanza, Yaron
AU - Sagiv, Yehoshua
AU - Beeri, Catriel
AU - Doytsher, Yerach
PY - 2010/1
Y1 - 2010/1
N2 - When integrating geo-spatial data sets, a join algorithm is used for finding sets of corresponding objects (i.e., objects that represent the same real-world entity). This article investigates location-based join algorithms for integration of several data sets. First, algorithms for integration of two data sets are presented and their performances, in terms of recall and precision, are compared. Then, two approaches for integration of more than two data sets are described. In one approach, all the integrated data sets are processed simultaneously. In the second approach, a join algorithm for two data sets is applied sequentially, either in a serial manner, where in each join at least one of the joined data sets is a single source, or in a hierarchical manner, where two join results can be joined. For the two approaches, join algorithms are given. The algorithms are designed to perform well even when location of objects are imprecise and each data set represents only some of the real-world entities. Results of extensive experiments with the different approaches are provided and analyzed. The experiments show the differences, in accuracy and efficiency, between the approaches, under different circumstances. The results also show that all our algorithms have much better accuracy than applying the commonly used one-sided nearest-neighbor join.
AB - When integrating geo-spatial data sets, a join algorithm is used for finding sets of corresponding objects (i.e., objects that represent the same real-world entity). This article investigates location-based join algorithms for integration of several data sets. First, algorithms for integration of two data sets are presented and their performances, in terms of recall and precision, are compared. Then, two approaches for integration of more than two data sets are described. In one approach, all the integrated data sets are processed simultaneously. In the second approach, a join algorithm for two data sets is applied sequentially, either in a serial manner, where in each join at least one of the joined data sets is a single source, or in a hierarchical manner, where two join results can be joined. For the two approaches, join algorithms are given. The algorithms are designed to perform well even when location of objects are imprecise and each data set represents only some of the real-world entities. Results of extensive experiments with the different approaches are provided and analyzed. The experiments show the differences, in accuracy and efficiency, between the approaches, under different circumstances. The results also show that all our algorithms have much better accuracy than applying the commonly used one-sided nearest-neighbor join.
KW - Corresponding objects
KW - Geospatial data sets
KW - Integration
KW - Location-based join
KW - Multiple sources
KW - Spatial join
UR - http://www.scopus.com/inward/record.url?scp=76649096679&partnerID=8YFLogxK
U2 - 10.1080/13658810802275560
DO - 10.1080/13658810802275560
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:76649096679
SN - 1365-8816
VL - 24
SP - 69
EP - 106
JO - International Journal of Geographical Information Science
JF - International Journal of Geographical Information Science
IS - 1
ER -