Reverse-engineering conjunctive queries from provenance examples

Daniel Deutch, Amir Gilad

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

The provenance of a query result details relevant parts of the input data as well as the computation leading to each output tuple. Multiple lines of work have studied the tracking and presentation of provenance, showing its effectiveness in explaining and justifying query results. The willingness of application owners to share provenance information for these purposes may however be hindered by the resulting exposure of the underlying query logic, which may be proprietary and confidential. We therefore formalize and study the following problem: when a (small) subset of the query results along with their provenance is given, what information is revealed about the underlying query? Our model is based on the provenance semiring framework and applies to many previously proposed provenance models. We analyze two flavors of the problem: (1) how many queries may be consistent with a given provenance example? and (2) what is the complexity of inferring a consistent query, or one that is a “best fit"? Our theoretical analysis shows that there may be many (for some models, even infinitely many in presence of self-joins) consistent queries, yet we provide practically efficient algorithms to find (best-fit) such queries. We experimentally show that the algorithms are generally successful in correctly reverse engineering queries, even when given only a few output examples and their provenance.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT 2019
Subtitle of host publication22nd International Conference on Extending Database Technology, Proceedings
EditorsZoi Kaoudi, Melanie Herschel, Carsten Binnig, Berthold Reinwald, Helena Galhardas, Irini Fundulaki
PublisherOpenProceedings.org
Pages277-288
Number of pages12
ISBN (Electronic)9783893180813
DOIs
StatePublished - 2019
Externally publishedYes
Event22nd International Conference on Extending Database Technology, EDBT 2019 - Lisbon, Portugal
Duration: 26 Mar 201929 Mar 2019

Publication series

NameAdvances in Database Technology - EDBT
Volume2019-March
ISSN (Electronic)2367-2005

Conference

Conference22nd International Conference on Extending Database Technology, EDBT 2019
Country/TerritoryPortugal
CityLisbon
Period26/03/1929/03/19

Bibliographical note

Publisher Copyright:
© 2019 Copyright held by the owner/author(s).

Fingerprint

Dive into the research topics of 'Reverse-engineering conjunctive queries from provenance examples'. Together they form a unique fingerprint.

Cite this