Efficient top-k retrieval with signatures

Chappell, Timothy, Geva, Shlomo, Nguyen, Anthony, & Zuccon, Guido (2013) Efficient top-k retrieval with signatures. In Culpepper, J. Shane, Sitbon, Laurianne, & Zuccon, Guido (Eds.) Proceedings of the 18th Australasian Document Computing Symposium, ACM, Brisbane, Australia, pp. 10-17.

View at publisher

Abstract

This paper describes a new method of indexing and searching large binary signature collections to efficiently find similar signatures, addressing the scalability problem in signature search. Signatures offer efficient computation with acceptable measure of similarity in numerous applications. However, performing a complete search with a given search argument (a signature) requires a Hamming distance calculation against every signature in the collection. This quickly becomes excessive when dealing with large collections, presenting issues of scalability that limit their applicability.

Our method efficiently finds similar signatures in very large collections, trading memory use and precision for greatly improved search speed. Experimental results demonstrate that our approach is capable of finding a set of nearest signatures to a given search argument with a high degree of speed and fidelity.

Impact and interest:

1 citations in Scopus
Search Google Scholar™
1 citations in Web of Science®

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 66949
Item Type: Conference Paper
Refereed: Yes
Additional URLs:
Keywords: Document Signatures, Near-Duplicate Detection, Hamming Distance, Locality-Sensitive Hashing, Nearest Neighbour, Top-K
DOI: 10.1145/2537734.2537742
ISBN: 9781450325240
Divisions: Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Deposited On: 05 Feb 2014 22:37
Last Modified: 25 Mar 2014 07:30

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page