WebPut : Efficient web-based data imputation

Li, Zhixu, Sharaf, Mohamed A., Sitbon, Laurianne, Sadiq, Shazia, Indulska, Marta, & Zhou, Xiaofang (2012) WebPut : Efficient web-based data imputation. In Wang, Sean X., Cruz, Isabel, Delis, Alex, & Huang, Guangyan (Eds.) 13th International Conference on Web Information Systems Engineering - WISE 2012, Springer Berlin Heidelberg, Paphos, Cyprus, pp. 243-256.

View at publisher


In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is also proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Experiments based on several real-world data collections demonstrate that WebPut outperforms existing approaches.

Impact and interest:

7 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

274 since deposited on 11 Jan 2013
25 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 56384
Item Type: Conference Paper
Refereed: Yes
Keywords: Web-based Data Imputation, WebPut, Incomplete Data, Data quality
DOI: 10.1007/978-3-642-35063-4_18
ISBN: 9783642350627
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Natural Language Processing (080107)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > INFORMATION SYSTEMS (080600) > Database Management (080604)
Divisions: Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2012 Springer-Verlag Berlin Heidelberg
Copyright Statement: Author retains, in addition to uses permitted by law, the right to communicate the content of the Contribution to
other scientists, to share the Contribution with them in manuscript form, to perform or present the Contribution or
to use the content for non-commercial internal and educational purposes, provided that the Springer publication is mentioned as the original source of publication in any printed or electronic materials. Author retains the right to republish the Contribution in any collection consisting solely of Author’s ownworks without charge but must ensure that the publication by Springer is properly credited and that the relevant copyright notice is repeated verbatim.
Author may self-archive an author-created version of his/her Contribution on his/her own website and/or in his/her institutional repository, as well as on a non-commercial archival repository such as ArXiv/CoRR and HAL, including his/her final version. Author may also deposit this version on his/her funder’s or funder’s designated repository at the funder’s request or as a result of a legal obligation. Author may not use the publisher’s PDF version, which is posted on www.springerlink.com, for the purpose of self-archiving or deposit. Furthermore, Author may only post his/her version provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer’s website. The link should be accompanied by the following text: "The original publication is available at www.springerlink.com". Author retains the right to use his/her Contribution for his/her further scientific career by including the final published paper in his/her dissertation or doctoral thesis provided acknowledgement is given to the original source of publication. Author also retains the right to use, without having to pay a fee and without having to inform the publisher, parts of the Contribution (e.g. illustrations) for inclusion in future work, and to publish a substantially revised version (at least 30% new content) elsewhere, provided that the original Springer Contribution is properly cited.
Deposited On: 11 Jan 2013 02:06
Last Modified: 01 Jul 2017 04:50

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page