Factors influencing robustness and effectiveness of conditional random fields in active learning frameworks
Kholghi, Mahnoosh, Sitbon, Laurianne, Zuccon, Guido, & Nguyen, Anthony (2014) Factors influencing robustness and effectiveness of conditional random fields in active learning frameworks. In Nayak, Richi, Li, Xue, Liu, Lin, Ong, Kok-Leong, Zhao, Yanchang, & Kennedy, Paul (Eds.) AusDM 2014 : The Twelfth Australasian Data Mining Conference, 27-28 November 2014, Queensland University of Technology, Gardens Point Campus, Brisbane, Australia.
Active learning approaches reduce the annotation cost required by traditional supervised approaches to reach the same effectiveness by actively selecting informative instances during the learning phase. However, effectiveness and robustness of the learnt models are influenced by a number of factors. In this paper we investigate the factors that affect the effectiveness, more specifically in terms of stability and robustness, of active learning models built using conditional random fields (CRFs) for information extraction applications. Stability, defined as a small variation of performance when small variation of the training data or a small variation of the parameters occur, is a major issue for machine learning models, but even more so in the active learning framework which aims to minimise the amount of training data required. The factors we investigate are a) the choice of incremental vs. standard active learning, b) the feature set used as a representation of the text (i.e., morphological features, syntactic features, or semantic features) and c) Gaussian prior variance as one of the important CRFs parameters. Our empirical findings show that incremental learning and the Gaussian prior variance lead to more stable and robust models across iterations. Our study also demonstrates that orthographical, morphological and contextual features as a group of basic features play an important role in learning effective models across all iterations.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||Conference Paper|
|Keywords:||active learning, robustness, effectiveness, conditional random fields, Gaussian prior variance, concept extraction|
|Subjects:||Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000)|
|Divisions:||Current > Schools > School of Electrical Engineering & Computer Science
Current > Schools > School of Information Systems
Current > QUT Faculties and Divisions > Science & Engineering Faculty
|Copyright Owner:||Copyright 2014 [please consult the authors]|
|Deposited On:||11 Dec 2014 23:51|
|Last Modified:||30 Mar 2016 11:12|
Repository Staff Only: item control page