Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach

Kutty, Sangeetha, Tran, Tien, Nayak, Richi, & Li, Yuefeng (2008) Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach. In Fuhr, Norber, Kamps, Jaap, Lalmas, Mounia, Malik, Saadia, & Trotman, Andrew (Eds.) 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007 Dagstuhl Castle, December 17-19, 2007, Germany.

View at publisher


This paper presents the experimental study conducted over the INEX 2007 Document Mining Challenge corpus employing a frequent subtree-based incremental clustering approach. Using the structural information of the XML documents, the closed frequent subtrees are generated. A matrix is then developed representing the closed frequent subtree distribution in documents. This matrix is used to progressively cluster the XML documents. In spite of the large number of documents in INEX 2007 Wikipedia dataset, the proposed frequent subtree-based incremental clustering approach was successful in clustering the documents.

Impact and interest:

4 citations in Scopus
Search Google Scholar™
4 citations in Web of Science®

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

267 since deposited on 26 Feb 2009
9 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 18356
Item Type: Conference Paper
Refereed: Yes
Additional Information: For more information, please refer to the journal’s/conference website (see hypertext link) or contact the author.
DOI: 10.1007/978-3-540-85902-4_17
ISBN: 978-3-540-85901-7
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Copyright Owner: Copyright 2008 Springer
Deposited On: 26 Feb 2009 05:45
Last Modified: 09 Jun 2010 13:27

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page