TY - JOUR
T1 - The complexity of learning tree patterns from example graphs
AU - Cohen, Sara
AU - Weiss, Yaacov Y.
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/5
Y1 - 2016/5
N2 - This article investigates the problem of learning tree patterns that return nodes with a given set of labels, from example graphs provided by the user. Example graphs are annotated by the user as being either positive or negative. The goal is then to determine whether there exists a tree pattern returning tuples of nodes with the given labels in each of the positive examples, but in none of the negative examples, and furthermore, to find one such pattern if it exists. These are called the satisfiability and learning problems, respectively. This article thoroughly investigates the satisfiability and learning problems in a variety of settings. In particular, we consider example sets that (1) may contain only positive examples, or both positive and negative examples, (2) may contain directed or undirected graphs, and (3) may have multiple occurrences of labels or be uniquely labeled (to some degree). In addition, we consider tree patterns of different types that can allow, or prohibit, wildcard labeled nodes and descendant edges. We also consider two different semantics for mapping tree patterns to graphs. The complexity of satisfiability is determined for the different combinations of settings. For cases in which satisfiability is polynomial, it is also shown that learning is polynomial. (This is nontrivial as satisfying patterns may be exponential in size.) Finally, the minimal learning problem, that is, that of finding a minimal-sized satisfying pattern, is studied for cases in which satisfiability is polynomial.
AB - This article investigates the problem of learning tree patterns that return nodes with a given set of labels, from example graphs provided by the user. Example graphs are annotated by the user as being either positive or negative. The goal is then to determine whether there exists a tree pattern returning tuples of nodes with the given labels in each of the positive examples, but in none of the negative examples, and furthermore, to find one such pattern if it exists. These are called the satisfiability and learning problems, respectively. This article thoroughly investigates the satisfiability and learning problems in a variety of settings. In particular, we consider example sets that (1) may contain only positive examples, or both positive and negative examples, (2) may contain directed or undirected graphs, and (3) may have multiple occurrences of labels or be uniquely labeled (to some degree). In addition, we consider tree patterns of different types that can allow, or prohibit, wildcard labeled nodes and descendant edges. We also consider two different semantics for mapping tree patterns to graphs. The complexity of satisfiability is determined for the different combinations of settings. For cases in which satisfiability is polynomial, it is also shown that learning is polynomial. (This is nontrivial as satisfying patterns may be exponential in size.) Finally, the minimal learning problem, that is, that of finding a minimal-sized satisfying pattern, is studied for cases in which satisfiability is polynomial.
KW - Graphs
KW - Learning
KW - Tree patterns
UR - http://www.scopus.com/inward/record.url?scp=84969931952&partnerID=8YFLogxK
U2 - 10.1145/2890492
DO - 10.1145/2890492
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84969931952
SN - 0362-5915
VL - 41
JO - ACM Transactions on Database Systems
JF - ACM Transactions on Database Systems
IS - 2
M1 - 14
ER -