Scalable document hashing and retrieval

Chappell, Timothy A. (2015) Scalable document hashing and retrieval. PhD thesis, Queensland University of Technology.


This thesis studies document signatures, which are small representations of documents and other objects that can be stored compactly and compared for similarity. This research finds that document signatures can be effectively and efficiently used to both search and understand relationships between documents in large collections, scalable enough to search a billion documents in a fraction of a second. Deliverables arising from the research include an investigation of the representational capacity of document signatures, the publication of an open-source signature search platform and an approach for scaling signature retrieval to operate efficiently on collections containing hundreds of millions of documents.

Impact and interest:

Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

57 since deposited on 11 Jan 2016
27 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 90044
Item Type: QUT Thesis (PhD)
Supervisor: Geva, Shlomo, Zuccon, Guido, Trotman, Andrew, Sitbon, Laurianne, & Nguyen, Anthony
Keywords: Information retrieval, Document signatures, Signature files, Relevance feedback, Superimposed coding, Locality-sensitive hashing, Topological signatures, Dimensionality reduction, Nearest-neighbour, Hamming distance problem
Divisions: Current > QUT Faculties and Divisions > Science & Engineering Faculty
Institution: Queensland University of Technology
Deposited On: 11 Jan 2016 00:09
Last Modified: 02 Jul 2017 14:44

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page