TY - GEN

T1 - Certain and possible XPath answers

AU - Cohen, Sara

AU - Weiss, Yaacov Y.

PY - 2013

Y1 - 2013

N2 - Formulating an XPath query over an XML document is a difficult chore for a non-expert user. This paper introduces a novel approach to ease the querying process. Instead of specifying a query, the user simply marks positive examples X+ of nodes that fit her information need. She may also mark negative examples X- of undesirable nodes. A deductive method, to suggest additional nodes that will interest the user, is developed in this paper. To be precise, a node y is a certain answer if every query returning all positive examples X+, and not returning any negative example from X -, must also return y. Similarly, y is a possible answer if there exists a query returning X+ and y, while not returning any node in X-. Thus, y is likely to be of interest to the user if y is a certain answer, and unlikely to be of interest if y is not even a possible answer. The complexity of finding certain and possible answers, with respect to various classes of XPath, is studied. It is shown that for a wide variety of XPath queries (including child and descendant axes, wildcards, branching and attribute constraints), certain and possible answers can be found efficiently, provided that X+ and X- are of bounded size. To prove this result a novel algorithm is developed.

AB - Formulating an XPath query over an XML document is a difficult chore for a non-expert user. This paper introduces a novel approach to ease the querying process. Instead of specifying a query, the user simply marks positive examples X+ of nodes that fit her information need. She may also mark negative examples X- of undesirable nodes. A deductive method, to suggest additional nodes that will interest the user, is developed in this paper. To be precise, a node y is a certain answer if every query returning all positive examples X+, and not returning any negative example from X -, must also return y. Similarly, y is a possible answer if there exists a query returning X+ and y, while not returning any node in X-. Thus, y is likely to be of interest to the user if y is a certain answer, and unlikely to be of interest if y is not even a possible answer. The complexity of finding certain and possible answers, with respect to various classes of XPath, is studied. It is shown that for a wide variety of XPath queries (including child and descendant axes, wildcards, branching and attribute constraints), certain and possible answers can be found efficiently, provided that X+ and X- are of bounded size. To prove this result a novel algorithm is developed.

UR - http://www.scopus.com/inward/record.url?scp=84875584414&partnerID=8YFLogxK

U2 - 10.1145/2448496.2448525

DO - 10.1145/2448496.2448525

M3 - Conference contribution

AN - SCOPUS:84875584414

SN - 9781450315982

T3 - ACM International Conference Proceeding Series

SP - 237

EP - 248

BT - ICDT 2013 - 16th International Conference on Database Theory, Proceedings

T2 - 16th International Conference on Database Theory, ICDT 2013

Y2 - 18 March 2013 through 22 March 2013

ER -