Text mining for screening efficiency? Testing within a Cochrane public health review

Weightman, Alison, Baker, Philip, Thomas, James, Lovie-Toon, Yolanda, Francis, Daniel, & O'Mara-Eves, Alison (2014) Text mining for screening efficiency? Testing within a Cochrane public health review. In 2014 Cochrane Colloquium, 21 - 26 October 2014, Hyderabad, India. (Unpublished)

View at publisher (open access)



The requirement for dual screening of titles and abstracts to select papers to examine in full text can create a huge workload, not least when the topic is complex and a broad search strategy is required, resulting in a large number of results. An automated system to reduce this burden, while still assuring high accuracy, has the potential to provide huge efficiency savings within the review process.


To undertake a direct comparison of manual screening with a semi‐automated process (priority screening) using a machine classifier. The research is being carried out as part of the current update of a population‐level public health review.


Authors have hand selected studies for the review update, in duplicate, using the standard Cochrane Handbook methodology. A retrospective analysis, simulating a quasi‐‘active learning’ process (whereby a classifier is repeatedly trained based on ‘manually’ labelled data) will be completed, using different starting parameters. Tests will be carried out to see how far different training sets, and the size of the training set, affect the classification performance; i.e. what percentage of papers would need to be manually screened to locate 100% of those papers included as a result of the traditional manual method.


From a search retrieval set of 9555 papers, authors excluded 9494 papers at title/abstract and 52 at full text, leaving 9 papers for inclusion in the review update. The ability of the machine classifier to reduce the percentage of papers that need to be manually screened to identify all the included studies, under different training conditions, will be reported.


The findings of this study will be presented along with an estimate of any efficiency gains for the author team if the screening process can be semi‐automated using text mining methodology, along with a discussion of the implications for text mining in screening papers within complex health reviews.

Impact and interest:

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

30 since deposited on 24 Mar 2015
17 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 82726
Item Type: Conference Item (Poster)
Refereed: Yes
Keywords: systematic review, methods, public health, Cochrane review, screening, text mining
Subjects: Australian and New Zealand Standard Research Classification > MEDICAL AND HEALTH SCIENCES (110000) > PUBLIC HEALTH AND HEALTH SERVICES (111700)
Divisions: Current > QUT Faculties and Divisions > Faculty of Health
Current > Institutes > Institute of Health and Biomedical Innovation
Current > Schools > School of Public Health & Social Work
Copyright Owner: Copyright 2014 The Author(s)
Deposited On: 24 Mar 2015 03:24
Last Modified: 24 Mar 2015 03:24

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page