Fast and Effective Clustering of XML Data Utilizing their Structural Information

Nayak, Richi (2008) Fast and Effective Clustering of XML Data Utilizing their Structural Information. Knowledge and Information Systems (KAIS), 14(2), pp. 197-215.

View at publisher


This paper presents the incremental clustering algorithm, XML documents Clustering with Level Similarity (XCLS), that groups the XML documents according to structural similarity. A level structure format is introduced to represent the structure of XML documents for efficient processing. A global criterion function that measures the similarity between the new document and existing clusters is developed. It avoids the need to compute the pair-wise similarity between two individual documents and hence saves a huge amount of computing effort. XCLS is further modified to incorporate the semantic meanings of XML tags for investigating the trade-offs between accuracy and efficiency. The empirical analysis shows that the structural similarity overplays the semantic similarity in the clustering process of the structured data such as XML. The experimental analysis shows that the XCLS method is fast and accurate in clustering the heterogeneous documents by structures.

Impact and interest:

45 citations in Scopus
Search Google Scholar™
34 citations in Web of Science®

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

216 since deposited on 08 Jul 2008
12 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 13993
Item Type: Journal Article
Refereed: Yes
Additional Information: For more information, please refer to the journal's website (see hypertext link) or contact the author.
DOI: 10.1007/s10115-007-0080-8
ISSN: 0219-3116
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Copyright Owner: Copyright 2008 Springer
Deposited On: 08 Jul 2008 00:00
Last Modified: 01 Mar 2012 04:00

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page