Effective pattern discovery for text mining

Zhong, Ning, Li, Yuefeng, & Wu, Sheng-Tang (2010) Effective pattern discovery for text mining. IEEE Transactions on Knowledge and Data Engineering, 24(1), pp. 30-44.

View at publisher

Abstract

Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase) based approaches should perform better than the term-based ones, but many experiments did not support this hypothesis. This paper presents an innovative technique, effective pattern discovery which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance.

Impact and interest:

51 citations in Scopus
Search Google Scholar™
33 citations in Web of Science®

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

1,241 since deposited on 22 Jun 2011
162 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 42066
Item Type: Journal Article
Refereed: Yes
Keywords: Communities , Computational modeling , Databases , Electronic mail , Noise measurement , Text mining , data mining , information filtering , pattern evolving , pattern mining , text mining
DOI: 10.1109/TKDE.2010.211
ISSN: 1041-4347
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > INFORMATION SYSTEMS (080600)
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Funding:
Copyright Owner: Copyright 2010 IEEE
Copyright Statement: Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Deposited On: 22 Jun 2011 03:06
Last Modified: 09 Jan 2015 06:44

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page