Knowledge discovery using pattern taxonomy model in text mining
Wu, Sheng-Tang (2007) Knowledge discovery using pattern taxonomy model in text mining. PhD thesis, Queensland University of Technology.
In the last decade, many data mining techniques have been proposed for fulfilling various knowledge discovery tasks in order to achieve the goal of retrieving useful information for users. Various types of patterns can then be generated using these techniques, such as sequential patterns, frequent itemsets, and closed and maximum patterns. However, how to effectively exploit the discovered patterns is still an open research issue, especially in the domain of text mining. Most of the text mining methods adopt the keyword-based approach to construct text representations which consist of single words or single terms, whereas other methods have tried to use phrases instead of keywords, based on the hypothesis that the information carried by a phrase is considered more than that by a single term. Nevertheless, these phrase-based methods did not yield significant improvements due to the fact that the patterns with high frequency (normally the shorter patterns) usually have a high value on exhaustivity but a low value on specificity, and thus the specific patterns encounter the low frequency problem.
This thesis presents the research on the concept of developing an effective Pattern Taxonomy Model (PTM) to overcome the aforementioned problem by deploying discovered patterns into a hypothesis space. PTM is a pattern-based method which adopts the technique of sequential pattern mining and uses closed patterns as features in the representative. A PTM-based information filtering system is implemented and evaluated by a series of experiments on the latest version of the Reuters dataset, RCV1. The pattern evolution schemes are also proposed in this thesis with the attempt of utilising information from negative training examples to update the discovered knowledge. The results show that the PTM outperforms not only all up-to-date data mining-based methods, but also the traditional Rocchio and the state-of-the-art BM25 and Support Vector Machines (SVM) approaches.
Impact and interest:
Citation countsare sourced monthly fromand citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloadsdisplays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||QUT Thesis (PhD)|
|Supervisor:||Li, Yuefeng, Xu, Yue, & Chen, Yi|
|Additional Information:||Recipient of 2007 Outstanding Doctoral Thesis Award|
|Keywords:||pattern taxonomy model, information retrieval, text mining, data mining, association rules, sequential pattern mining, closed sequential patterns, pattern deploying, pattern evolving, ODTA|
|Divisions:||Past > QUT Faculties & Divisions > Faculty of Science and Technology|
Past > Schools > School of Software Engineering & Data Communications
|Department:||Faculty of Information Technology|
|Institution:||Queensland University of Technology|
|Copyright Owner:||Copyright Sheng-Tang Wu|
|Deposited On:||03 Dec 2008 14:08|
|Last Modified:||17 Jun 2013 16:18|
Repository Staff Only: item control page