Automated gathering of Web information: An in-depth examination of agents interacting with search engines

Jansen, Bernard, Mullen, T., Spink, Amanda H., & Pedersen, J. (2006) Automated gathering of Web information: An in-depth examination of agents interacting with search engines. ACM Transactions on Internet Technology, 6(4), pp. 442-464.

View at publisher (open access)


The Web has become a worldwide repository of information which individuals, companies, and organizations utilize to solve or address various information problems. Many of these Web users utilize automated agents to gather this information for them. Some assume that this approach represents a more sophisticated method of searching. However, there is little research investigating how Web agents search for online information. In this research, we first provide a classification for information agent using stages of information gathering, gathering approaches, and agent architecture. We then examine an implementation of one of the resulting classifications in detail, investigating how agents search for information on Web search engines, including the session, query, term, duration and frequency of interactions. For this temporal study, we analyzed three data sets of queries and page views from agents interacting with the Excite and AltaVista search engines from 1997 to 2002, examining approximately 900,000 queries submitted by over 3,000 agents. Findings include: (1) agent sessions are extremely interactive, with sometimes hundreds of interactions per second (2) agent queries are comparable to human searchers, with little use of query operators, (3) Web agents are searching for a relatively limited variety of information, wherein only 18% of the terms used are unique, and (4) the duration of agent-Web search engine interaction typically spans several hours. We discuss the implications for Web information agents and search engines.

Impact and interest:

12 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 47854
Item Type: Journal Article
Refereed: Yes
Keywords: Agent searching, Search engines, Web searching
DOI: 10.1145/1183463.1183468
Divisions: Current > Research Centres > Office of Education Research
Current > QUT Faculties and Divisions > Faculty of Education
Copyright Owner: © 2006 ACM.
Deposited On: 20 Dec 2011 07:22
Last Modified: 29 Feb 2012 13:25

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page