Active learning: A step towards automating medical concept extraction

Kholghi, Mahnoosh, Sitbon, Laurianne, Zuccon, Guido, & Nguyen, Anthony (2015) Active learning: A step towards automating medical concept extraction. Journal of the American Medical Informatics Association, 23(2), pp. 289-296.

[img] Accepted Version (PDF 433kB)
Administrators only until March 2017 | Request a copy from author

View at publisher



This paper presents an automatic active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort, and (2) the robustness of incremental active learning framework across different selection criteria and datasets is determined.

Materials and methods

The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional Random Fields as the supervised method, and least confidence and information density as two selection criteria for active learning framework were used. The effect of incremental learning vs. standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. Two clinical datasets were used for evaluation: the i2b2/VA 2010 NLP challenge and the ShARe/CLEF 2013 eHealth Evaluation Lab.


The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared to the Random sampling baseline, the saving is at least doubled.


Incremental active learning guarantees robustness across all selection criteria and datasets. The reduction of annotation effort is always above random sampling and longest sequence baselines.


Incremental active learning is a promising approach for building effective and robust medical concept extraction models, while significantly reducing the burden of manual annotation.

Impact and interest:

0 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 85672
Item Type: Journal Article
Refereed: Yes
Keywords: Medical Concept Extraction, Clinical Free Text, Active Learning, Conditional Random Fields, Robustness Analysis
DOI: 10.1093/jamia/ocv069
ISSN: 1527-974X
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Natural Language Processing (080107)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > LIBRARY AND INFORMATION STUDIES (080700) > Health Informatics (080702)
Divisions: Current > Schools > School of Electrical Engineering & Computer Science
Current > Schools > School of Information Systems
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2015 Oxford University Press
Copyright Statement: This is a pre-copyedited, author-produced PDF of an article accepted for publication in Journal of the American Medical Informatics Association following peer review. The version of record J Am Med Inform Assoc. 2016 Mar;23(2):289-96 is available online at:
Deposited On: 19 Jul 2015 22:58
Last Modified: 20 May 2016 09:13

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page