Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. This paper describes the KDT system for Knowledge Discovery in Text, in which documents are labeled by keywords, and knowledge discovery is performed by analyzing the co-occurrence frequencies of the various keywords labeling the documents. We show how this keyword-frequency approach supports a range of KDD operations, providing a suitable foundation for knowledge discovery and exploration for collections of unstructured text.
Bibliographical noteFunding Information:
This research was supported by NSF grant IRI-9509819 and by grant 8615-1-96 from the Israeli Ministry of Science. The authors would like to thank the reviewers for helpful comments given on drafts of this paper.
- Data mining
- Distribution comparison
- Text categorization
- Text mining
- Trend analysis