Extraction and approximation of numerical attributes from the web

Dmitry Davidov, Ari Rappoport

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

27 Scopus citations

Abstract

We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet similarity information. First, we obtain from the Web and WordNet a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.

Original languageAmerican English
Title of host publicationACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Conference Proceedings
EditorsJan Hajic, Sandra Carberry, Stephen Clark
PublisherAssociation for Computational Linguistics (ACL)
Pages1308-1317
Number of pages10
ISBN (Electronic)1932432663, 9781932432664
StatePublished - 2010
Event48th Annual Meeting of the Association for Computational Linguistics, ACL 2010 - Uppsala, Sweden
Duration: 11 Jul 201016 Jul 2010

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume2010-July
ISSN (Print)0736-587X

Conference

Conference48th Annual Meeting of the Association for Computational Linguistics, ACL 2010
Country/TerritorySweden
CityUppsala
Period11/07/1016/07/10

Bibliographical note

Publisher Copyright:
© 2010 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Extraction and approximation of numerical attributes from the web'. Together they form a unique fingerprint.

Cite this