Scalable document hashing and retrieval
Chappell, Timothy A. (2015) Scalable document hashing and retrieval. PhD thesis, Queensland University of Technology.
This thesis studies document signatures, which are small representations of documents and other objects that can be stored compactly and compared for similarity. This research finds that document signatures can be effectively and efficiently used to both search and understand relationships between documents in large collections, scalable enough to search a billion documents in a fraction of a second. Deliverables arising from the research include an investigation of the representational capacity of document signatures, the publication of an open-source signature search platform and an approach for scaling signature retrieval to operate efficiently on collections containing hundreds of millions of documents.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||QUT Thesis (PhD)|
|Supervisor:||Geva, Shlomo, Zuccon, Guido, Trotman, Andrew, Sitbon, Laurianne, & Nguyen, Anthony|
|Keywords:||Information retrieval, Document signatures, Signature files, Relevance feedback, Superimposed coding, Locality-sensitive hashing, Topological signatures, Dimensionality reduction, Nearest-neighbour, Hamming distance problem|
|Divisions:||Current > QUT Faculties and Divisions > Science & Engineering Faculty|
|Institution:||Queensland University of Technology|
|Deposited On:||11 Jan 2016 00:09|
|Last Modified:||11 Jan 2016 00:11|
Repository Staff Only: item control page