Practical text mining

Ronen Feldman*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. In this tutorial we will present the general theory of Text Mining and will demonstrate several systems that use these principles to enable interactive exploration of large textual collections. We view Text Mining as a combination of Information Retrieval methods and Data Mining methods. We will describe generic techniques for text categorization and information extraction that are used by these systems. The systems that will be presented are KDT which is system for Knowledge Discovery in Texts, FACT, which discovers associations amongst keywords labeling the items in a collection of textual documents, and Text Explorer which is a system that provides a high level language for interactive exploration of textual collections. We will present a general architecture for text mining and will outline the algorithms and data structures behind the systems. We will give special emphasis to incremental algorithms and to efficient data structures. The Tutorial will cover the state of the art in this rapidly growing area of research.

Original languageAmerican English
Title of host publicationPrinciples of Data Mining and Knowledge Discovery - 2nd European Symposium, PKDD 1998, Proceedings
EditorsJan M. Zytkow, Mohamed Quafafou
PublisherSpringer Verlag
ISBN (Print)3540650687, 9783540650683
DOIs
StatePublished - 1998
Externally publishedYes
Event2nd European Symposium on Principles of Data Mining and Knowledge Discovery in Databases, PKDD 1998 - Nantes, France
Duration: 23 Sep 199826 Sep 1998

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1510
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd European Symposium on Principles of Data Mining and Knowledge Discovery in Databases, PKDD 1998
Country/TerritoryFrance
CityNantes
Period23/09/9826/09/98

Bibliographical note

Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 1998.

Fingerprint

Dive into the research topics of 'Practical text mining'. Together they form a unique fingerprint.

Cite this