Reverse engineering SPJ-queries from examples

Yaacov Y. Weiss, Sara Cohen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

32 Scopus citations

Abstract

This paper investigates the problem of reverse engineering, i.e., learning, select-project-join (SPJ) queries from a user-provided example set, containing positive and negative tuples. The goal is then to determine whether there exists a query returning all the positive tuples, but none of the negative tuples, and furthermore, to find such a query, if it exists. These are called the satisfiability and learning problems, respectively. The ability to solve these problems is an important step in simplifying the querying process for non-expert users. This paper thoroughly investigates the satisfiability and learning problems in a variety of settings. In particular, we consider several classes of queries, which allow different combinations of the operators select, project and join. In addition, we compare the complexity of satisfiability and learning, when the query is, or is not, of bounded size. We note that bounded-size queries are of particular interest, as they can be used to avoid over-fitting (i.e., tailoring a query precisely to only the seen examples). In order to fully understand the underlying factors which make satisfiability and learning (in) tractable, we consider different components of the problem, namely, the size of a query to be learned, the size of the schema and the number of examples. We study the complexity of our problems, when considering these as part of the input, as constants or as parameters (i.e., as in parameterized complexity analysis). Depending on the setting, the complexity of satisfiability and learning can vary significantly. Among other results, our analysis also provides new problems that are complete for W [3], for which few natural problems are known. Finally, by considering a variety of settings, we derive insight on how the different facets of our problem interplay with the size of the database, thereby providing the theoretical foundations necessary for a future implementation of query learning from examples.

Original languageEnglish
Title of host publicationPODS 2017 - Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
PublisherAssociation for Computing Machinery
Pages151-166
Number of pages16
ISBN (Electronic)9781450341981
DOIs
StatePublished - 9 May 2017
Event36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017 - Chicago, United States
Duration: 14 May 201719 May 2017

Publication series

NameProceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
VolumePart F127745

Conference

Conference36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017
Country/TerritoryUnited States
CityChicago
Period14/05/1719/05/17

Bibliographical note

Publisher Copyright:
© 2017 ACM.

Fingerprint

Dive into the research topics of 'Reverse engineering SPJ-queries from examples'. Together they form a unique fingerprint.

Cite this