QUT ePrints

Mining positive and negative patterns for relevance feature discovery

Li, Yuefeng, Algarni, Abdulmohsen, & Zhong, Ning (2010) Mining positive and negative patterns for relevance feature discovery. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Washington, DC, pp. 753-762.

[img] Published Version (PDF 564kB)
Administrators only | Request a copy from author

    View at publisher

    Abstract

    It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures.

    Impact and interest:

    15 citations in Scopus
    Search Google Scholar™

    Citation countsare sourced monthly from Scopus and Web of Science® citation databases.

    These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

    Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

    ID Code: 42068
    Item Type: Conference Paper
    Keywords: user preferences , text mining, polysemy, synonymy
    DOI: 10.1145/1835804.1835900
    ISBN: 9781450300551
    Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > INFORMATION SYSTEMS (080600)
    Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
    Copyright Owner: Copyright 2010 ACM
    Deposited On: 22 Jun 2011 13:08
    Last Modified: 12 Dec 2014 11:27

    Export: EndNote | Dublin Core | BibTeX

    Repository Staff Only: item control page