TY - GEN
T1 - Querying parse trees of stochastic context-free grammars
AU - Cohen, Sara
AU - Kimelfeld, Benny
PY - 2010
Y1 - 2010
N2 - Stochastic context-free grammars (SCFGs) have long been recognized as useful for a large variety of tasks including natural language processing, morphological parsing, speech recognition, information extraction, Web-page wrapping and even analysis of RNA. A string and an SCFG jointly represent a probabilistic interpretation of the meaning of the string, in the form of a (possibly infinite) probability space of parse trees. The problem of evaluating a query over this probability space is considered under the conventional semantics of querying a probabilistic database. For general SCFGs, extremely simple queries may have results that include irrational probabilities. But, for a large subclass of SCFGs (that includes all the standard studied subclasses of SCFGs) and the language of tree-pattern queries with projection (and child/descendant edges), it is shown that query results have rational probabilities with a polynomial-size bit representation and, more importantly, an efficient query-evaluation algorithm is presented.
AB - Stochastic context-free grammars (SCFGs) have long been recognized as useful for a large variety of tasks including natural language processing, morphological parsing, speech recognition, information extraction, Web-page wrapping and even analysis of RNA. A string and an SCFG jointly represent a probabilistic interpretation of the meaning of the string, in the form of a (possibly infinite) probability space of parse trees. The problem of evaluating a query over this probability space is considered under the conventional semantics of querying a probabilistic database. For general SCFGs, extremely simple queries may have results that include irrational probabilities. But, for a large subclass of SCFGs (that includes all the standard studied subclasses of SCFGs) and the language of tree-pattern queries with projection (and child/descendant edges), it is shown that query results have rational probabilities with a polynomial-size bit representation and, more importantly, an efficient query-evaluation algorithm is presented.
KW - probabilistic databases
KW - querying
KW - stochastic context free grammars
UR - http://www.scopus.com/inward/record.url?scp=77954517660&partnerID=8YFLogxK
U2 - 10.1145/1804669.1804680
DO - 10.1145/1804669.1804680
M3 - Conference contribution
AN - SCOPUS:77954517660
SN - 9781605589473
T3 - ACM International Conference Proceeding Series
SP - 62
EP - 75
BT - Database Theory - ICDT 2010
T2 - 13th International Conference on Database Theory, ICDT'10
Y2 - 23 March 2010 through 25 March 2010
ER -