Scaling up analogy with crowdsourcing and machine learning

Joel Chan, Tom Hope, Dafna Shahaf, Aniket Kittur

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations


Despite tremendous advances in computational models of human analogy, a persistent challenge has been scaling up to find useful analogies in large, messy, real-world data. The availability of large idea repositories (e.g., the U.S. patent database) could significantly accelerate innovation and discovery in a way never previously possible. Previous approaches have been limited by relying on hand-created databases that have high relational structure but are very sparse (e.g., predicate calculus representations). Traditional machine-learning/information-retrieval similarity metrics (e.g., LSA) can scale to large, natural-language datasets; however, while these methods are good at detecting surface similarity, they struggle to account for structural similarity. In this paper, we propose to leverage crowdsourcing techniques to construct a dataset with rich "analogy-tuning" signals, used to guide machine learning models towards matches based on relations rather than surface features. We demonstrate our approach with a crowdsourced analogy identification task, whose results are used to train deep learning algorithms. Our initial results suggest that a deep learning model trained on positive/negative example analogies from the task can find more analogous matches than an LSA baseline, and that incorporating behavioral signals (such as queries used to retrieve an analogy) can further boost its performance.

Original languageAmerican English
Pages (from-to)31-40
Number of pages10
JournalCEUR Workshop Proceedings
StatePublished - 2016
Event24th International Conference on Case-Based Reasoning Workshops, ICCBR-WS 2016 - Atlanta, United States
Duration: 31 Oct 20162 Nov 2016

Bibliographical note

Publisher Copyright:
Copyright © 2016 for this paper by its authors.


  • Analogy
  • Crowdsourcing
  • Machine learning


Dive into the research topics of 'Scaling up analogy with crowdsourcing and machine learning'. Together they form a unique fingerprint.

Cite this