Harnessing information from injury narrative in the 'big data' era: Understanding and applying machine learning for injury surveillance

Vallmuur, Kirsten, Marucci-Wellman, Helen R., Taylor, Jennifer A., Lehto, Mark, Corns, Helen L., & Smith, Gordon S. (2016) Harnessing information from injury narrative in the 'big data' era: Understanding and applying machine learning for injury surveillance. Injury Prevention, 22, i34-i42.

View at publisher (open access)



Vast amounts of injury narratives are collected daily and are available electronically in real time and have great potential for use in injury surveillance and evaluation. Machine learning algorithms have been developed to assist in identifying cases and classifying mechanisms leading to injury in a much timelier manner than is possible when relying on manual coding of narratives. The aim of this paper is to describe the background, growth, value, challenges and future directions of machine learning as applied to injury surveillance.


This paper reviews key aspects of machine learning using injury narratives, providing a case study to demonstrate an application to an established human-machine learning approach.


The range of applications and utility of narrative text has increased greatly with advancements in computing techniques over time. Practical and feasible methods exist for semi-automatic classification of injury narratives which are accurate, efficient and meaningful. The human-machine learning approach described in the case study achieved high sensitivity and positive predictive value and reduced the need for human coding to less than one-third of cases in one large occupational injury database.


The last 20 years have seen a dramatic change in the potential for technological advancements in injury surveillance. Machine learning of ‘big injury narrative data’ opens up many possibilities for expanded sources of data which can provide more comprehensive, ongoing and timely surveillance to inform future injury prevention policy and practice.

Impact and interest:

1 citations in Scopus
Search Google Scholar™
1 citations in Web of Science®

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

24 since deposited on 17 Dec 2015
24 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 91289
Item Type: Journal Article
Refereed: Yes
Additional URLs:
Keywords: injury surveillance, machine learning, narrative text, coding
DOI: 10.1136/injuryprev-2015-041813
ISSN: 1353-8047
Subjects: Australian and New Zealand Standard Research Classification > MEDICAL AND HEALTH SCIENCES (110000) > PUBLIC HEALTH AND HEALTH SERVICES (111700) > Health Information Systems (incl. Surveillance) (111711)
Divisions: Current > Research Centres > Centre for Accident Research & Road Safety - Qld (CARRS-Q)
Current > QUT Faculties and Divisions > Faculty of Health
Current > Institutes > Institute of Health and Biomedical Innovation
Current > Schools > School of Psychology & Counselling
Copyright Owner: Copyright 2015 The Author(s)
Deposited On: 17 Dec 2015 00:43
Last Modified: 16 May 2016 14:44

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page