TY - GEN
T1 - Unsupervised induction of labeled parse trees by clustering with syntactic features
AU - Reichart, Roi
AU - Rappoport, Ari
PY - 2008
Y1 - 2008
N2 - We present an algorithm for unsupervised induction of labeled parse trees. The algorithm has three stages: bracketing, initial labeling, and label clustering. Bracketing is done from raw text using an unsupervised incremental parser. Initial labeling is done using a merging model that aims at minimizing the grammar description length. Finally, labels are clustered to a desired number of labels using syntactic features extracted from the initially labeled trees. The algorithm obtains 59% labeled f-score on the WSJ10 corpus, as compared to 35% in previous work, and substantial error reduction over a random baseline. We report results for English, German and Chinese corpora, using two label mapping methods and two label set sizes.
AB - We present an algorithm for unsupervised induction of labeled parse trees. The algorithm has three stages: bracketing, initial labeling, and label clustering. Bracketing is done from raw text using an unsupervised incremental parser. Initial labeling is done using a merging model that aims at minimizing the grammar description length. Finally, labels are clustered to a desired number of labels using syntactic features extracted from the initially labeled trees. The algorithm obtains 59% labeled f-score on the WSJ10 corpus, as compared to 35% in previous work, and substantial error reduction over a random baseline. We report results for English, German and Chinese corpora, using two label mapping methods and two label set sizes.
UR - http://www.scopus.com/inward/record.url?scp=80053386994&partnerID=8YFLogxK
U2 - 10.3115/1599081.1599172
DO - 10.3115/1599081.1599172
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:80053386994
SN - 9781905593446
T3 - Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference
SP - 721
EP - 728
BT - Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 22nd International Conference on Computational Linguistics, Coling 2008
Y2 - 18 August 2008 through 22 August 2008
ER -