Retrieval of health advice on the web: AEHRC at ShARe/CLEF eHealth Evaluation Lab Task 3

Zuccon, G., Koopman, B., & Nguyen, A. (2013) Retrieval of health advice on the web: AEHRC at ShARe/CLEF eHealth Evaluation Lab Task 3. In Proceedings of CLEF Workshop on Cross-Language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis, Valencia, Spain.

View at publisher


This paper details the participation of the Australian e- Health Research Centre (AEHRC) in the ShARe/CLEF 2013 eHealth Evaluation Lab { Task 3. This task aims to evaluate the use of information retrieval (IR) systems to aid consumers (e.g. patients and their relatives) in seeking health advice on the Web. Our submissions to the ShARe/CLEF challenge are based on language models generated from the web corpus provided by the organisers. Our baseline system is a standard Dirichlet smoothed language model. We enhance the baseline by identifying and correcting spelling mistakes in queries, as well as expanding acronyms using AEHRC's Medtex medical text analysis platform. We then consider the readability and the authoritativeness of web pages to further enhance the quality of the document ranking. Measures of readability are integrated in the language models used for retrieval via prior probabilities. Prior probabilities are also used to encode authoritativeness information derived from a list of top-100 consumer health websites.

Empirical results show that correcting spelling mistakes and expanding acronyms found in queries signi cantly improves the e ectiveness of the language model baseline. Readability priors seem to increase retrieval e ectiveness for graded relevance at early ranks (nDCG@5, but not precision), but no improvements are found at later ranks and when considering binary relevance. The authoritativeness prior does not appear to provide retrieval gains over the baseline: this is likely to be because of the small overlap between websites in the corpus and those in the top-100 consumer-health websites we acquired.

Impact and interest:

0 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

270 since deposited on 25 Sep 2013
8 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 62874
Item Type: Conference Paper
Refereed: No
Divisions: Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2013 Please consult the authors
Deposited On: 25 Sep 2013 00:17
Last Modified: 12 Jul 2017 14:43

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page