Integrating and evaluating neural word embeddings in information retrieval

Zuccon, Guido, Koopman, Bevan, Bruza, Peter, & Azzopardi, Leif (2015) Integrating and evaluating neural word embeddings in information retrieval. In Proceedings of the 20th Australasian Document Computing Symposium, ACM, Western Sydney University, Parramatta, NSW, Article No. 12.

View at publisher

Abstract

Recent advances in neural language models have contributed new methods for learning distributed vector representations of words (also called word embeddings). Two such methods are the continuous bag-of-words model and the skipgram model. These methods have been shown to produce embeddings that capture higher order relationships between words that are highly effective in natural language processing tasks involving the use of word similarity and word analogy. Despite these promising results, there has been little analysis of the use of these word embeddings for retrieval.

Motivated by these observations, in this paper, we set out to determine how these word embeddings can be used within a retrieval model and what the benefit might be. To this aim, we use neural word embeddings within the well known translation language model for information retrieval. This language model captures implicit semantic relations between the words in queries and those in relevant documents, thus producing more accurate estimations of document relevance.

The word embeddings used to estimate neural language models produce translations that differ from previous translation language model approaches; differences that deliver improvements in retrieval effectiveness. The models are robust to choices made in building word embeddings and, even more so, our results show that embeddings do not even need to be produced from the same corpus being used for retrieval.

Impact and interest:

0 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

92 since deposited on 20 Dec 2015
75 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 91418
Item Type: Conference Paper
Refereed: Yes
Additional URLs:
DOI: 10.1145/2838931.2838936
ISBN: 9781450340403
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Current > Schools > School of Information Systems
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2014 ACM
Deposited On: 20 Dec 2015 23:55
Last Modified: 08 Jan 2016 14:44

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page