Deep Web Collection Selection
King, John Douglas (2004) Deep Web Collection Selection. Masters by Research thesis, Queensland University of Technology.
The deep web contains a massive number of collections that are mostly invisible to search engines. These collections often contain high-quality, structured information that cannot be crawled using traditional methods.
An important problem is selecting which of these collections to search. Automatic collection selection methods try to solve this problem by suggesting the best subset of deep web collections to search based on a query.
A few methods for deep Web collection selection have proposed in Collection Retrieval Inference Network system and Glossary of Servers, Server system.
The drawback in these methods is that they require communication between the search broker and the collections, and need metadata about each collection.
This thesis compares three different sampling methods that do not require communication with the broker or metadata about each collection. It also transforms some traditional information retrieval based techniques to this area. In addition, the thesis tests these techniques using INEX collection for total 18 collections (including 12232 XML documents) and total 36 queries.
The experiment shows that the performance of sample-based technique is satisfactory in average.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||QUT Thesis (Masters by Research)|
|Supervisor:||Li, Yuefeng & Geva, Shlomo|
|Keywords:||information retrieval, deep web, collection selection, singular value decomposition, latent semantic analysis, sampling, query focused, probabilistic|
|Divisions:||Past > QUT Faculties & Divisions > Faculty of Science and Technology
Past > Schools > School of Software Engineering & Data Communications
|Department:||Faculty of Information Technology|
|Institution:||Queensland University of Technology|
|Copyright Owner:||Copyright John Douglas King|
|Deposited On:||03 Dec 2008 03:54|
|Last Modified:||17 Oct 2013 22:56|
Repository Staff Only: item control page