TY - GEN

T1 - Running tree automata on probabilistic XML

AU - Cohen, Sara

AU - Kimelfeld, Benny

AU - Sagiv, Yehoshua

PY - 2009

Y1 - 2009

N2 - Tree automata (specifically, bottom-up and unranked) form a powerful tool for querying and maintaining validity of XML documents. XML with uncertain data can be modeled as a probability space of labeled trees, and that space is often represented by a tree with distributional nodes. This paper investigates the problem of evaluating a tree automaton over such a representation, where the goal is to compute the probability that the automaton accepts a random possible world. This problem is generally intractable, but for the case where the tree automaton is deterministic (and its transitions are defined by deterministic string automata), an efficient algorithm is presented. The paper discusses the applications of this result, including the ability to sample and to evaluate queries (e.g., in monadic second-order logic) while requiring a-priori conformance to a schema (e.g., DTD). XML schemas also include attribute constraints, and the complexity of key, foreign-key and inclusion constraints are studied in the context of probabilistic XML. Finally, the paper discusses the generalization of the results to an extended data model, where distributional nodes can repeatedly sample the same subtree, thereby adding another exponent to the size of the probability space.

AB - Tree automata (specifically, bottom-up and unranked) form a powerful tool for querying and maintaining validity of XML documents. XML with uncertain data can be modeled as a probability space of labeled trees, and that space is often represented by a tree with distributional nodes. This paper investigates the problem of evaluating a tree automaton over such a representation, where the goal is to compute the probability that the automaton accepts a random possible world. This problem is generally intractable, but for the case where the tree automaton is deterministic (and its transitions are defined by deterministic string automata), an efficient algorithm is presented. The paper discusses the applications of this result, including the ability to sample and to evaluate queries (e.g., in monadic second-order logic) while requiring a-priori conformance to a schema (e.g., DTD). XML schemas also include attribute constraints, and the complexity of key, foreign-key and inclusion constraints are studied in the context of probabilistic XML. Finally, the paper discusses the generalization of the results to an extended data model, where distributional nodes can repeatedly sample the same subtree, thereby adding another exponent to the size of the probability space.

KW - Probabilistic XML

KW - Probabilistic trees

KW - Tree automata

KW - XML constraints

KW - XML query evaluation

KW - XML schema

UR - http://www.scopus.com/inward/record.url?scp=70349097023&partnerID=8YFLogxK

U2 - 10.1145/1559795.1559831

DO - 10.1145/1559795.1559831

M3 - Conference contribution

AN - SCOPUS:70349097023

SN - 9781605585536

T3 - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

SP - 227

EP - 236

BT - PODS'09 - Proceedings of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems

T2 - 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '09

Y2 - 29 June 2009 through 1 July 2009

ER -