Automatic classification of free-text radiology reports to identify limb fractures using machine learning and the SNOMED CT ontology

Zuccon, Guido, Wagholikar, Amol S., Nguyen, Anthony N., Butt, Luke, Chu, Kevin, Martin, Shane, & Greenslade, Jaimi (2013) Automatic classification of free-text radiology reports to identify limb fractures using machine learning and the SNOMED CT ontology. In AMIA Summits on Translational Science Proceedings, American Medical Informatics Association, San Francisco, California, pp. 300-304.

View at publisher (open access)



To develop and evaluate machine learning techniques that identify limb fractures and other abnormalities (e.g. dislocations) from radiology reports.

Materials and Methods

99 free-text reports of limb radiology examinations were acquired from an Australian public hospital. Two clinicians were employed to identify fractures and abnormalities from the reports; a third senior clinician resolved disagreements. These assessors found that, of the 99 reports, 48 referred to fractures or abnormalities of limb structures. Automated methods were then used to extract features from these reports that could be useful for their automatic classification. The Naive Bayes classification algorithm and two implementations of the support vector machine algorithm were formally evaluated using cross-fold validation over the 99 reports.


Results show that the Naive Bayes classifier accurately identifies fractures and other abnormalities from the radiology reports. These results were achieved when extracting stemmed token bigram and negation features, as well as using these features in combination with SNOMED CT concepts related to abnormalities and disorders. The latter feature has not been used in previous works that attempted classifying free-text radiology reports.


Automated classification methods have proven effective at identifying fractures and other abnormalities from radiology reports (F-Measure up to 92.31%). Key to the success of these techniques are features such as stemmed token bigrams, negations, and SNOMED CT concepts associated with morphologic abnormalities and disorders.


This investigation shows early promising results and future work will further validate and strengthen the proposed approaches.

Impact and interest:

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

39 since deposited on 07 May 2014
18 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 69317
Item Type: Conference Paper
Refereed: No
Divisions: Current > Institutes > Institute for Future Environments
Current > Schools > School of Information Systems
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2013 [please consult the author]
Deposited On: 07 May 2014 00:29
Last Modified: 08 May 2014 13:57

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page