Document Clustering Using Incremental and Pairwise Approaches

Tran, Tien, Nayak, Richi, & Bruza, Peter D. (2008) Document Clustering Using Incremental and Pairwise Approaches. In 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, 17-19 December 2007, Dagstuhl Castle, Germany.

[img] Published Version (PDF 306kB)
Administrators only | Request a copy from author

View at publisher


This paper presents the experiments and results of a clustering approach for clustering of the large Wikipedia dataset in the INEX 2007 Document Mining Challenge. The clustering approach employed makes use of an incremental clustering method and a pairwise clustering method. The approach enables us to perform the clustering task on a large dataset by first reducing the dimension of the dataset to an undefined number of clusters using the incremental method. The lower-dimension dataset is then clustered to a required number of clusters using the pairwise method. In this way, clustering of the large number of documents is performed successfully and the accuracy of the clustering solution is achieved.

Impact and interest:

9 citations in Scopus
8 citations in Web of Science®
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 18354
Item Type: Conference Paper
Refereed: Yes
Additional Information: For more information, please refer to the publisher's website (see hypertext link) or contact the author.
Keywords: Clustering, Structure, Content, XML, INEX 2007
DOI: 10.1007/978-3-540-85902-4_20
ISBN: 978-3-540-85901-7
ISSN: 0302-9743 (Print) 1611-3349 (Online)
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Copyright Owner: Copyright 2008 Springer
Deposited On: 26 Feb 2009 05:26
Last Modified: 17 Jul 2014 03:35

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page