Knowledge discovery using pattern taxonomy model in text mining

Wu, Sheng-Tang (2007) Knowledge discovery using pattern taxonomy model in text mining. PhD thesis, Queensland University of Technology.


In the last decade, many data mining techniques have been proposed for fulfilling various knowledge discovery tasks in order to achieve the goal of retrieving useful information for users. Various types of patterns can then be generated using these techniques, such as sequential patterns, frequent itemsets, and closed and maximum patterns. However, how to effectively exploit the discovered patterns is still an open research issue, especially in the domain of text mining. Most of the text mining methods adopt the keyword-based approach to construct text representations which consist of single words or single terms, whereas other methods have tried to use phrases instead of keywords, based on the hypothesis that the information carried by a phrase is considered more than that by a single term. Nevertheless, these phrase-based methods did not yield significant improvements due to the fact that the patterns with high frequency (normally the shorter patterns) usually have a high value on exhaustivity but a low value on specificity, and thus the specific patterns encounter the low frequency problem.

This thesis presents the research on the concept of developing an effective Pattern Taxonomy Model (PTM) to overcome the aforementioned problem by deploying discovered patterns into a hypothesis space. PTM is a pattern-based method which adopts the technique of sequential pattern mining and uses closed patterns as features in the representative. A PTM-based information filtering system is implemented and evaluated by a series of experiments on the latest version of the Reuters dataset, RCV1. The pattern evolution schemes are also proposed in this thesis with the attempt of utilising information from negative training examples to update the discovered knowledge. The results show that the PTM outperforms not only all up-to-date data mining-based methods, but also the traditional Rocchio and the state-of-the-art BM25 and Support Vector Machines (SVM) approaches.

Impact and interest:

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

3,304 since deposited on 03 Dec 2008
171 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 16675
Item Type: QUT Thesis (PhD)
Supervisor: Li, Yuefeng, Xu, Yue, & Chen, Yi
Additional Information: Recipient of 2007 Outstanding Doctoral Thesis Award
Keywords: pattern taxonomy model, information retrieval, text mining, data mining, association rules, sequential pattern mining, closed sequential patterns, pattern deploying, pattern evolving, ODTA
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Past > Schools > School of Software Engineering & Data Communications
Department: Faculty of Information Technology
Institution: Queensland University of Technology
Copyright Owner: Copyright Sheng-Tang Wu
Deposited On: 03 Dec 2008 04:08
Last Modified: 17 Jun 2013 06:18

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page