Abstract
Despite tremendous advances in computational models of human analogy, a persistent challenge has been scaling up to find useful analogies in large, messy, real-world data. The availability of large idea repositories (e.g., the U.S. patent database) could significantly accelerate innovation and discovery in a way never previously possible. Previous approaches have been limited by relying on hand-created databases that have high relational structure but are very sparse (e.g., predicate calculus representations). Traditional machine-learning/information-retrieval similarity metrics (e.g., LSA) can scale to large, natural-language datasets; however, while these methods are good at detecting surface similarity, they struggle to account for structural similarity. In this paper, we propose to leverage crowdsourcing techniques to construct a dataset with rich "analogy-tuning" signals, used to guide machine learning models towards matches based on relations rather than surface features. We demonstrate our approach with a crowdsourced analogy identification task, whose results are used to train deep learning algorithms. Our initial results suggest that a deep learning model trained on positive/negative example analogies from the task can find more analogous matches than an LSA baseline, and that incorporating behavioral signals (such as queries used to retrieve an analogy) can further boost its performance.
Original language | English |
---|---|
Pages (from-to) | 31-40 |
Number of pages | 10 |
Journal | CEUR Workshop Proceedings |
Volume | 1815 |
State | Published - 2016 |
Event | 24th International Conference on Case-Based Reasoning Workshops, ICCBR-WS 2016 - Atlanta, United States Duration: 31 Oct 2016 → 2 Nov 2016 |
Bibliographical note
Publisher Copyright:Copyright © 2016 for this paper by its authors.
Keywords
- Analogy
- Crowdsourcing
- Machine learning