TY - GEN
T1 - Maximally joining probabilistic data
AU - Kimelfeld, Benny
AU - Sagiv, Yehoshua
PY - 2007
Y1 - 2007
N2 - Conceptually, the common approach to manipulating probabilistic data is to evaluate relational queries and then calculate the probability of each tuple in the result. This approach ignores the possibility that the probabilities of complete answers are too low and, hence, partial answers (with sufficiently high probabilities) become important. Therefore, we consider the semantics in which answers are maximal (i.e., have the smallest degree of incompleteness), subject tothe constraint that the probability is still above a given threshold. We investigate the complexity of joining relations under the above semantics. In contrast to the deterministic case, this approach gives rise to two different enumeration problems. The first is finding all maximal sets of tuples that are join consistent, connected and have a joint probability above the threshold. The second is computing all maximal tuples that are answers of partial joins and have a probability above the threshold. Both problems are tractable under data complexity. We also consider query-and-data complexity, which rules out as efficient the following naive algorithm: compute all partial answers and then choose the maximal ones among those with probabilities above the threshold. We give efficient algorithms for several, important special cases. We also show that, in general, the first problem is NP-hard whereas the secondis #P-hard.
AB - Conceptually, the common approach to manipulating probabilistic data is to evaluate relational queries and then calculate the probability of each tuple in the result. This approach ignores the possibility that the probabilities of complete answers are too low and, hence, partial answers (with sufficiently high probabilities) become important. Therefore, we consider the semantics in which answers are maximal (i.e., have the smallest degree of incompleteness), subject tothe constraint that the probability is still above a given threshold. We investigate the complexity of joining relations under the above semantics. In contrast to the deterministic case, this approach gives rise to two different enumeration problems. The first is finding all maximal sets of tuples that are join consistent, connected and have a joint probability above the threshold. The second is computing all maximal tuples that are answers of partial joins and have a probability above the threshold. Both problems are tractable under data complexity. We also consider query-and-data complexity, which rules out as efficient the following naive algorithm: compute all partial answers and then choose the maximal ones among those with probabilities above the threshold. We give efficient algorithms for several, important special cases. We also show that, in general, the first problem is NP-hard whereas the secondis #P-hard.
KW - Maximal answers
KW - Natural join
KW - Probabilistic databases
KW - Query-and-data complexity
UR - http://www.scopus.com/inward/record.url?scp=35448980379&partnerID=8YFLogxK
U2 - 10.1145/1265530.1265572
DO - 10.1145/1265530.1265572
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:35448980379
SN - 1595936858
SN - 9781595936851
T3 - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
SP - 303
EP - 312
BT - Proceedings of the Twenty-sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2007
T2 - 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2007
Y2 - 11 June 2007 through 13 June 2007
ER -