Running tree automata on probabilistic XML

Sara Cohen*, Benny Kimelfeld, Yehoshua Sagiv

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

27 Scopus citations

Abstract

Tree automata (specifically, bottom-up and unranked) form a powerful tool for querying and maintaining validity of XML documents. XML with uncertain data can be modeled as a probability space of labeled trees, and that space is often represented by a tree with distributional nodes. This paper investigates the problem of evaluating a tree automaton over such a representation, where the goal is to compute the probability that the automaton accepts a random possible world. This problem is generally intractable, but for the case where the tree automaton is deterministic (and its transitions are defined by deterministic string automata), an efficient algorithm is presented. The paper discusses the applications of this result, including the ability to sample and to evaluate queries (e.g., in monadic second-order logic) while requiring a-priori conformance to a schema (e.g., DTD). XML schemas also include attribute constraints, and the complexity of key, foreign-key and inclusion constraints are studied in the context of probabilistic XML. Finally, the paper discusses the generalization of the results to an extended data model, where distributional nodes can repeatedly sample the same subtree, thereby adding another exponent to the size of the probability space.

Original languageEnglish
Title of host publicationPODS'09 - Proceedings of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems
Pages227-236
Number of pages10
DOIs
StatePublished - 2009
Event28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '09 - Providence, RI, United States
Duration: 29 Jun 20091 Jul 2009

Publication series

NameProceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

Conference

Conference28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '09
Country/TerritoryUnited States
CityProvidence, RI
Period29/06/091/07/09

Keywords

  • Probabilistic XML
  • Probabilistic trees
  • Tree automata
  • XML constraints
  • XML query evaluation
  • XML schema

Fingerprint

Dive into the research topics of 'Running tree automata on probabilistic XML'. Together they form a unique fingerprint.

Cite this