Abstract
We introduce a set of biologically and computationally motivated design choices for modeling the learning of language, or of other types of sequential, hierarchically structured experience and behavior, and describe an implemented system that conforms to these choices and is capable of unsupervised learning from raw natural-language corpora. Given a stream of linguistic input, our model incrementally learns a grammar that captures its statistical patterns, which can then be used to parse or generate new data. The grammar constructed in this manner takes the form of a directed weighted graph, whose nodes are recursively (hierarchically) defined patterns over the elements of the input stream. We evaluated the model in seventeen experiments, grouped into five studies, which examined, respectively, (a) the generative ability of grammar learned from a corpus of natural language, (b) the characteristics of the learned representation, (c) sequence segmentation and chunking, (d) artificial grammar learning, and (e) certain types of structure dependence. The model's performance largely vindicates our design choices, suggesting that progress in modeling language acquisition can be made on a broad front-ranging from issues of generativity to the replication of human experimental findings-by bringing biological and computational considerations, as well as lessons from prior efforts, to bear on the modeling approach.
Original language | English |
---|---|
Pages (from-to) | 227-267 |
Number of pages | 41 |
Journal | Cognitive Science |
Volume | 39 |
Issue number | 2 |
DOIs | |
State | Published - 1 Mar 2015 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2014 Cognitive Science Society, Inc.
Keywords
- Generative grammar
- Grammar of behavior
- Graph-based representation
- Incremental learning
- Language learning
- Learning
- Linguistic experience
- Statistical learning