TY - JOUR
T1 - A modular information extraction system
AU - Feldman, Ronen
AU - Regev, Yizhar
AU - Gorodetsky, Maya
PY - 2008
Y1 - 2008
N2 - In today's information age, the amount of text documents available electronically (on the Web, on corporate intranets, on news wires and elsewhere) is overwhelming. Search engines and information retrieval, while useful to find documents that satisfy a certain query, offer little help with analyzing the unstructured documents themselves. Text Mining is the automated process of analyzing unstructured, natural language text in order to discover information and knowledge that are difficult to retrieve. Information Extraction (IE) centers on finding entities and relations in free text and provides a solid foundation for text mining. In this paper we present a modular IE system, based on the DIAL language. DIAL allows users to implement IE solutions for various domains rapidly, based on a common Natural Language Processing (NLP) infrastructure. We demonstrate in detail an implementation of a system for extracting relations in the intelligence news domain. We present an evaluation of our system and discuss enhancements for other domains, such as emails.
AB - In today's information age, the amount of text documents available electronically (on the Web, on corporate intranets, on news wires and elsewhere) is overwhelming. Search engines and information retrieval, while useful to find documents that satisfy a certain query, offer little help with analyzing the unstructured documents themselves. Text Mining is the automated process of analyzing unstructured, natural language text in order to discover information and knowledge that are difficult to retrieve. Information Extraction (IE) centers on finding entities and relations in free text and provides a solid foundation for text mining. In this paper we present a modular IE system, based on the DIAL language. DIAL allows users to implement IE solutions for various domains rapidly, based on a common Natural Language Processing (NLP) infrastructure. We demonstrate in detail an implementation of a system for extracting relations in the intelligence news domain. We present an evaluation of our system and discuss enhancements for other domains, such as emails.
KW - Information extraction
KW - Natural language processing
KW - Text mining applications
UR - http://www.scopus.com/inward/record.url?scp=51849135749&partnerID=8YFLogxK
U2 - 10.3233/ida-2008-12104
DO - 10.3233/ida-2008-12104
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:51849135749
SN - 1088-467X
VL - 12
SP - 51
EP - 71
JO - Intelligent Data Analysis
JF - Intelligent Data Analysis
IS - 1
ER -