Automatic query expansion : a structural linguistic perspective

Symonds, Michael, Bruza, Peter D., Zuccon, Guido, Koopman, Bevan, Sitbon, Laurianne, & Turner, Ian (2014) Automatic query expansion : a structural linguistic perspective. Journal of the American Society for Information Science and Technology.

View at publisher

Abstract

A user’s query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques ignore information about the dependencies that exist between words in natural language. However, more recent approaches have demonstrated that by explicitly modeling associations between terms significant improvements in retrieval effectiveness can be achieved over those that ignore these dependencies. State-of-the-art dependency-based approaches have been shown to primarily model syntagmatic associations. Syntagmatic associations infer a likelihood that two terms co-occur more often than by chance. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process will improve retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.

Impact and interest:

1 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

41 since deposited on 04 Jul 2013
18 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 61104
Item Type: Journal Article
Refereed: Yes
Keywords: Information Retrieval, Automatic Query Expansion, Tensor Encoding Model, Web Search, Structural Linguistics
DOI: 10.1002/asi.23065
ISSN: 1532-2882
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > INFORMATION SYSTEMS (080600)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > INFORMATION SYSTEMS (080600) > Information Systems not elsewhere classified (080699)
Divisions: Past > Schools > Computer Science
Current > Schools > School of Information Systems
Current > Schools > School of Mathematical Sciences
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2013 American Society for Information Science and Technology
Copyright Statement: This is a preprint of an article accepted for publication in Journal of the American Society for Information Science and Technology copyright (C) 2013 (American Society for Information Science and Technology)".
Deposited On: 04 Jul 2013 00:24
Last Modified: 01 Mar 2015 14:10

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page