Using Web Search Logs to Identify Query Classification Terms

Taksa, Isak, Zelikovitz, Sarah , & Spink, Amanda H. (2007) Using Web Search Logs to Identify Query Classification Terms. International Journal of Web Information Systems, 3(4), pp. 315-327.

View at publisher


Purpose – The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users.

Design/methodology/approach – The authors describe an iterative system that begins with a small set of manually labeled terms, which are used to label queries from the log. A set of background knowledge related to these labeled queries is acquired by combining web search results on these queries. This background set is used to obtain many terms that are related to the classification task. The system then ranks each of the related terms, choosing those that most fit the personal properties of the users. These terms are then used to begin the next iteration.

Findings – The authors identify the difficulties of classifying web logs, by approaching this problem from a machine learning perspective. By applying the approach developed, the authors are able to show that many queries in a large query log can be classified.

Research limitations/implications – Testing results in this type of classification work is difficult, as the true personal properties of web users are unknown. Evaluation of the classification results in terms of the comparison of classified queries to well known age-related sites is a direction that is currently being exploring.

Practical implications – This research is background work that can be incorporated in search engines or other web-based applications, to help marketing companies and advertisers.

Originality/value – This research enhances the current state of knowledge in short-text classification and query log learning. Classification schemes, Computer networks, Information retrieval, Man-machine systems, User interfaces

Impact and interest:

6 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 47872
Item Type: Journal Article
Refereed: Yes
DOI: 10.1108/17440080710848107
ISSN: 1744-0084
Copyright Owner: Emerald
Deposited On: 20 Dec 2011 08:47
Last Modified: 29 Feb 2012 13:35

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page