The impact of OCR accuracy on automated cancer classification of pathology reports
Zuccon, Guido, Nguyen, Anthony, Bergheim, Anton, Wickman, Sandra, & Grayson, Narelle (2012) The impact of OCR accuracy on automated cancer classification of pathology reports. In Studies in Health Technology and Informatics : Health Informatics: Building a Healthcare Future Through Trusted Information, IOS Press, Sydney, NSW, pp. 250-256.
Administrators only | Request a copy from author
To evaluate the effects of Optical Character Recognition (OCR) on the automatic cancer classification of pathology reports.
Scanned images of pathology reports were converted to electronic free-text using a commercial OCR system. A state-of-the-art cancer classification system, the Medical Text Extraction (MEDTEX) system, was used to automatically classify the OCR reports. Classifications produced by MEDTEX on the OCR versions of the reports were compared with the classification from a human amended version of the OCR reports.
The employed OCR system was found to recognise scanned pathology reports with up to 99.12% character accuracy and up to 98.95% word accuracy. Errors in the OCR processing were found to minimally impact on the automatic classification of scanned pathology reports into notifiable groups. However, the impact of OCR errors is not negligible when considering the extraction of cancer notification items, such as primary site, histological type, etc.
The automatic cancer classification system used in this work, MEDTEX, has proven to be robust to errors produced by the acquisition of freetext pathology reports from scanned images through OCR software. However, issues emerge when considering the extraction of cancer notification items.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
|Item Type:||Conference Paper|
|Divisions:||Current > Schools > School of Information Systems
Current > QUT Faculties and Divisions > Science & Engineering Faculty
|Copyright Owner:||Copyright 2012 IOS Press|
|Deposited On:||17 Jun 2014 22:59|
|Last Modified:||21 Jun 2014 16:34|
Repository Staff Only: item control page