Compositional data analysis (CoDA) approaches to distance in information retrieval

Thomas, P. & Lovell, D. R. (2014) Compositional data analysis (CoDA) approaches to distance in information retrieval. In SIGIR '14 Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, Association for Computing Machinery, Gold Coast, Qld., pp. 991-994.

View at publisher


Many techniques in information retrieval produce counts from a sample, and it is common to analyse these counts as proportions of the whole - term frequencies are a familiar example. Proportions carry only relative information and are not free to vary independently of one another: for the proportion of one term to increase, one or more others must decrease. These constraints are hallmarks of compositional data. While there has long been discussion in other fields of how such data should be analysed, to our knowledge, Compositional Data Analysis (CoDA) has not been considered in IR. In this work we explore compositional data in IR through the lens of distance measures, and demonstrate that common measures, naïve to compositions, have some undesirable properties which can be avoided with composition-aware measures. As a practical example, these measures are shown to improve clustering. Copyright 2014 ACM.

Impact and interest:

0 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 79872
Item Type: Conference Paper
Refereed: Yes
Keywords: Aitchison's distance, Compositions, Distance, Ratio, Similarity, Chemical analysis, Compositional data, Compositional data analysis, Relative information, Through the lens, Information retrieval
DOI: 10.1145/2600428.2609492
ISBN: 9781450322591 (ISBN)
Divisions: Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: ACM
Deposited On: 07 Jan 2015 02:54
Last Modified: 12 Jan 2015 22:39

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page