QUT ePrints

Document clustering with K-tree

De Vries, Christopher M. & Geva, Shlomo (2009) Document clustering with K-tree. Lecture Notes in Computer Science, 5631/2, pp. 420-431.

View at publisher

Abstract

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.

Impact and interest:

3 citations in Scopus
Search Google Scholar™
3 citations in Web of Science®

Citation countsare sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

54 since deposited on 05 Oct 2009
31 in the past twelve months

Full-text downloadsdisplays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 27756
Item Type: Journal Article
Additional Information: See additional URL to download software.
Additional URLs:
Keywords: INEX, XML Mining, Clustering, K-tree, Tree, Vector Quantization, Text Classification, Support Vector Machine
DOI: 10.1007/978-3-642-03761-0_43
ISBN: 9783642037603
ISSN: 0302-9743
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Pattern Recognition and Data Mining (080109)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > COMPUTATION THEORY AND MATHEMATICS (080200) > Analysis of Algorithms and Complexity (080201)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > LIBRARY AND INFORMATION STUDIES (080700) > Information Retrieval and Web Search (080704)
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Copyright Owner: Copyright 2009 Springer
Deposited On: 05 Oct 2009 13:00
Last Modified: 18 Jul 2014 16:08

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page