Active learning: A step towards automating medical concept extraction
Kholghi, Mahnoosh, Sitbon, Laurianne, Zuccon, Guido, & Nguyen, Anthony (2015) Active learning: A step towards automating medical concept extraction. Journal of the American Medical Informatics Association, 23(2), pp. 289-296.
Administrators only until March 2017 | Request a copy from author
This paper presents an automatic active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort, and (2) the robustness of incremental active learning framework across different selection criteria and datasets is determined.
Materials and methods
The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional Random Fields as the supervised method, and least confidence and information density as two selection criteria for active learning framework were used. The effect of incremental learning vs. standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. Two clinical datasets were used for evaluation: the i2b2/VA 2010 NLP challenge and the ShARe/CLEF 2013 eHealth Evaluation Lab.
The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared to the Random sampling baseline, the saving is at least doubled.
Incremental active learning guarantees robustness across all selection criteria and datasets. The reduction of annotation effort is always above random sampling and longest sequence baselines.
Incremental active learning is a promising approach for building effective and robust medical concept extraction models, while significantly reducing the burden of manual annotation.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
|Item Type:||Journal Article|
|Keywords:||Medical Concept Extraction, Clinical Free Text, Active Learning, Conditional Random Fields, Robustness Analysis|
|Subjects:||Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Natural Language Processing (080107)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > LIBRARY AND INFORMATION STUDIES (080700) > Health Informatics (080702)
|Divisions:||Current > Schools > School of Electrical Engineering & Computer Science
Current > Schools > School of Information Systems
Current > QUT Faculties and Divisions > Science & Engineering Faculty
|Copyright Owner:||Copyright 2015 Oxford University Press|
|Copyright Statement:||This is a pre-copyedited, author-produced PDF of an article accepted for publication in Journal of the American Medical Informatics Association following peer review. The version of record J Am Med Inform Assoc. 2016 Mar;23(2):289-96 is available online at: http://dx.doi.org/10.1093/jamia/ocv069|
|Deposited On:||19 Jul 2015 22:58|
|Last Modified:||20 May 2016 09:13|
Repository Staff Only: item control page