Pattern based browsing in document collections

Ronen Feldman, Willi Klösgen, Yaniv Ben-Yehuda, Gil Kedar, Vladimir Reznikov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

We present Document Explorer, a data mining system searching for patterns in document collections. These patterns provide knowledge on the application domain that is represented by the collection. A pattern can also be seen as a query that retrieves a set of documents. Thus the data mining tools can be used to identify interesting queries which can be used to browse the collection. The main pattern types, the system can search for, are frequent sets of concepts, association rules, concept distributions, and concept graphs. To enable the user to specify some explicit bias, the system provides several types of constraints for searching the vast implicit spaces of patterns that exist in the collection. The patterns which have been verified as interesting are structured and presented in a visual user interface allowing the user to operate on the results to refine and redirect search tasks or to access the associated documents. The system offers preprocessing tools to construct or refine a knowledge base of domain concepts and to create an internal representation of the document collection that will be used by all subsequent data mining operations. In this paper, we give an overview on the Document Explorer system. We summarize our methodical approaches and solutions for the special requirements of this document mining area.

Original languageEnglish
Title of host publicationPrinciples of Data Mining and Knowledge Discovery - 1st European Symposium, PKDD 1997, Proceedings
EditorsJan Komorowski, Jan Zytkow
PublisherSpringer Verlag
Pages112-122
Number of pages11
ISBN (Print)3540632239, 9783540632238
DOIs
StatePublished - 1997
Externally publishedYes
Event1st European Symposium on Principles of Data Mining and Knowledge Discovery, PKDD 1997 - Trondheim, Norway
Duration: 24 Jun 199727 Jun 1997

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1263
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st European Symposium on Principles of Data Mining and Knowledge Discovery, PKDD 1997
Country/TerritoryNorway
CityTrondheim
Period24/06/9727/06/97

Bibliographical note

Publisher Copyright:
© Springer-Vertag Berlin Heidelberg 1997.

Fingerprint

Dive into the research topics of 'Pattern based browsing in document collections'. Together they form a unique fingerprint.

Cite this