QUT ePrints

Overview of the INEX 2010 XML mining track : clustering and classification of XML documents

De Vries, Christopher Michael, Nayak, Richi, Kutty, Sangeetha, Geva, Shlomo, & Tagarelli, Andrea (2011) Overview of the INEX 2010 XML mining track : clustering and classification of XML documents. In Lecture Notes in Computer Science, Springer, Amsterdam. (In Press)

View at publisher

Abstract

The XML Document Mining track was launched for exploring two main ideas: (1) identifying key problems and new challenges of the emerging field of mining semi-structured documents, and (2) studying and assessing the potential of Machine Learning (ML) techniques for dealing with generic ML tasks in the structured domain, i.e., classification and clustering of semi-structured documents. This track has run for six editions during INEX 2005, 2006, 2007, 2008, 2009 and 2010. The first five editions have been summarized in previous editions and we focus here on the 2010 edition.

INEX 2010 included two tasks in the XML Mining track: (1) unsupervised clustering task and (2) semi-supervised classification task where documents are organized in a graph. The clustering task requires the participants to group the documents into clusters without any knowledge of category labels using an unsupervised learning algorithm. On the other hand, the classification task requires the participants to label the documents in the dataset into known categories using a supervised learning algorithm and a training set. This report gives the details of clustering and classification tasks.

Impact and interest:

1 citations in Scopus
Search Google Scholar™
0 citations in Web of Science®

Citation countsare sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

200 since deposited on 11 Apr 2011
36 in the past twelve months

Full-text downloadsdisplays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 41223
Item Type: Conference Paper
Keywords: XML document mining, INEX, Wikipedia, Structure, Content, Clustering, Classification
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Pattern Recognition and Data Mining (080109)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > LIBRARY AND INFORMATION STUDIES (080700) > Information Retrieval and Web Search (080704)
Divisions: Past > Schools > Computer Science
Past > QUT Faculties & Divisions > Faculty of Science and Technology
Copyright Owner: Copyright 2010 [Please consult the authors]
Deposited On: 12 Apr 2011 07:11
Last Modified: 18 Jul 2014 22:10

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page