An algorithm to cluster documents based on relevance

Desai, Monica & Spink, Amanda H. (2005) An algorithm to cluster documents based on relevance. Information Processing and Management, 41(5), pp. 1035-1049.

View at publisher


Search engines fail to make a clear distinction between items of varying relevance when presenting search results to users. Instead, they rely on the user of the system to estimate which items are relevant, partially relevant, or not relevant. The user of the system is given the task of distinguishing between documents that are relevant to different degrees. This process often hinders the accessibility of relevant or partially relevant documents, particularly when the results set is large and documents of varying relevance are scattered throughout the set. In this paper, we present a clustering scheme that groups documents within relevant, partially relevant, and not relevant regions for a given search. A clustering algorithm accomplishes the task of clustering documents based on relevance. The clusters were evaluated by end-users issuing categorical, interval, and descriptive relevance judgments for the documents returned from a search. The degree of overlap between users and the system for each of the clustered regions was measured to determine the overall effectiveness of the algorithm. This research showed that clustering documents on the Web by regions of relevance is highly necessary and quite feasible.

Impact and interest:

6 citations in Scopus
6 citations in Web of Science®
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

211 since deposited on 09 Aug 2006
5 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 4753
Item Type: Journal Article
Refereed: Yes
Keywords: information retrieval
DOI: 10.1016/j.ipm.2004.05.003
ISSN: 0306-4573
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > LIBRARY AND INFORMATION STUDIES (080700) > Information Retrieval and Web Search (080704)
Divisions: Past > Research Centres > Office of Education Research
Current > QUT Faculties and Divisions > Faculty of Education
Past > QUT Faculties & Divisions > Faculty of Science and Technology
Copyright Owner: Copyright 2005 Elsevier
Copyright Statement: Reproduced in accordance with the copyright policy of the publisher.
Deposited On: 09 Aug 2006 00:00
Last Modified: 24 Jun 2017 14:38

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page