A study of results overlap and uniqueness among major web search engines

Spink, Amanda H., Jansen, Bernard J., Blakely, Chris, & Koshman, Sherry (2006) A study of results overlap and uniqueness among major web search engines. Information Processing and Management, 42(5), pp. 1379-1391.

View at publisher


The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results retrieved by multiple Web search engines for a large set of more than 10,000 queries. Previous smaller studies have discussed a lack of overlap in results returned by Web search engines for the same queries. The goal of the current study was to conduct a large-scale study to measure the overlap of search results on the first result page (both non-sponsored and sponsored) across the four most popular Web search engines, at specific points in time using a large number of queries. The Web search engines included in the study were MSN Search, Google, Yahoo! and Ask Jeeves. Our study then compares these results with the first page results retrieved for the same queries by the metasearch engine Dogpile.com. Two sets of randomly selected user-entered queries, one set was 10,316 queries and the other 12,570 queries, from Infospace’s Dogpile.com search engine [the first set was from Dogpile, the second was from across the Infospace Network of search properties were submitted to the four single Web search engines. Findings show that the percent of total results unique to only one of the four Web search engines was 84.9%, shared by two of the three Web search engines was 11.4%, shared by three of the Web search engines was 2.6%, and shared by all four Web search engines was 1.1%. This small degree of overlap shows the significant difference in the way major Web search engines retrieve and rank results in response to given queries. Results point to the value of metasearch engines in Web retrieval to overcome the biases of individual search engines.

Impact and interest:

78 citations in Scopus
53 citations in Web of Science®
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

2,932 since deposited on 25 Aug 2006
77 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 4755
Item Type: Journal Article
Refereed: Yes
Keywords: information retrieval
DOI: 10.1016/j.ipm.2005.11.001
ISSN: 0306-4573
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > LIBRARY AND INFORMATION STUDIES (080700) > Information Retrieval and Web Search (080704)
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Copyright Owner: Copyright 2006 Elsevier
Copyright Statement: Reproduced in accordance with the copyright policy of the publisher.
Deposited On: 25 Aug 2006 00:00
Last Modified: 29 Feb 2012 13:25

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page