Indexing for subtree similarity-search using edit distance

Sara Cohen*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

Given a tree Q and a large set of trees T = {T1,⋯, T n}, the subtree similarity-search problem is that of finding the subtrees of trees among T that are most similar to Q, using the tree edit distance metric. Determining similarity using tree edit distance has been proven useful in a variety of application areas. While subtree similarity-search has been studied in the past, solutions required traversal of all of T, which poses a severe bottleneck in processing time, as T grows larger. This paper proposes the first index structure for subtree similarity-search, provided that the unit cost function is used. Extensive experimentation and comparison to previous work shows the huge improvement gained when using the proposed index structure and processing algorithm.

Original languageEnglish
Title of host publicationSIGMOD 2013 - International Conference on Management of Data
Pages49-60
Number of pages12
DOIs
StatePublished - 2013
Event2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013 - New York, NY, United States
Duration: 22 Jun 201327 Jun 2013

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013
Country/TerritoryUnited States
CityNew York, NY
Period22/06/1327/06/13

Keywords

  • Edit distance
  • Indexing

Fingerprint

Dive into the research topics of 'Indexing for subtree similarity-search using edit distance'. Together they form a unique fingerprint.

Cite this