A pattern based two-stage text classifier

Bijaksana, Moch Arif, Li, Yuefeng, & Algarni, Abdulmohsen (2013) A pattern based two-stage text classifier. Lecture Notes in Computer Science : Machine Learning and Data Mining in Pattern Recognition, 7988, pp. 169-182.

View at publisher


In a classification problem typically we face two challenging issues, the diverse characteristic of negative documents and sometimes a lot of negative documents that are closed to positive documents. Therefore, it is hard for a single classifier to clearly classify incoming documents into classes. This paper proposes a novel gradual problem solving to create a two-stage classifier. The first stage identifies reliable negatives (negative documents with weak positive characteristics). It concentrates on minimizing the number of false negative documents (recall-oriented). We use Rocchio, an existing recall based classifier, for this stage. The second stage is a precision-oriented “fine tuning”, concentrates on minimizing the number of false positive documents by applying pattern (a statistical phrase) mining techniques. In this stage a pattern-based scoring is followed by threshold setting (thresholding). Experiment shows that our statistical phrase based two-stage classifier is promising.

Impact and interest:

1 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 61989
Item Type: Journal Article
Refereed: Yes
Keywords: Two-stage classification, Text classification, Pattern mining, Scoring, Thresholding
DOI: 10.1007/978-3-642-39712-7_13
ISSN: 0302-9743
Divisions: Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2013 Springer-Verlag Berlin Heidelberg
Deposited On: 22 Aug 2013 21:45
Last Modified: 26 Aug 2013 02:54

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page