TEG - A hybrid approach to information extraction

Benjamin Rosenfeld*, Ronen Feldman, Moshe Fresko, Jonathan Schler, Yonatan Aumann

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

10 Scopus citations

Abstract

This paper describes a hybrid statistical and knowledge-based information extraction model, able to extract entities and relations at the sentence level. The model attempts to retain and improve the high accuracy levels of knowledge-based systems while drastically reducing the amount of manual labor by relying on statistics drawn from a training corpus. The implementation of the model, called TEG (Trainable Extraction Grammar), can be adapted to any IE domain by writing a suitable set of rules in a SCFG (Stochastic Context Free Grammar) based extraction language, and training them using an annotated corpus. The system does not contain any purely linguistic components, such as PoS tagger or parser. We demonstrate the performance of the system on several named entity extraction and relation extraction tasks. The experiments show that our hybrid approach outperforms both purely statistical and purely knowledge-based systems, while requiring orders of magnitude less manual rule writing and smaller amount of training data. The improvement in accuracy is slight for named entity extraction task and more pronounced for relation extraction.

Original languageAmerican English
Pages589-596
Number of pages8
StatePublished - 2004
Externally publishedYes
EventCIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management - Washington, DC, United States
Duration: 8 Nov 200413 Nov 2004

Conference

ConferenceCIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management
Country/TerritoryUnited States
CityWashington, DC
Period8/11/0413/11/04

Keywords

  • HMM
  • Information Extraction
  • Rules Based Systems
  • Text Mining

Fingerprint

Dive into the research topics of 'TEG - A hybrid approach to information extraction'. Together they form a unique fingerprint.

Cite this