WebPut : Efficient web-based data imputation
Li, Zhixu, Sharaf, Mohamed A., Sitbon, Laurianne, Sadiq, Shazia, Indulska, Marta, & Zhou, Xiaofang (2012) WebPut : Efficient web-based data imputation. In Wang, Sean X., Cruz, Isabel, Delis, Alex, & Huang, Guangyan (Eds.) 13th International Conference on Web Information Systems Engineering - WISE 2012, Springer Berlin Heidelberg, Paphos, Cyprus, pp. 243-256.
In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is also proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Experiments based on several real-world data collections demonstrate that WebPut outperforms existing approaches.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||Conference Paper|
|Keywords:||Web-based Data Imputation, WebPut, Incomplete Data, Data quality|
|Subjects:||Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Natural Language Processing (080107)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > INFORMATION SYSTEMS (080600) > Database Management (080604)
|Divisions:||Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
|Copyright Owner:||Copyright 2012 Springer-Verlag Berlin Heidelberg|
|Copyright Statement:||Author retains, in addition to uses permitted by law, the right to communicate the content of the Contribution to
other scientists, to share the Contribution with them in manuscript form, to perform or present the Contribution or
to use the content for non-commercial internal and educational purposes, provided that the Springer publication is mentioned as the original source of publication in any printed or electronic materials. Author retains the right to republish the Contribution in any collection consisting solely of Author’s ownworks without charge but must ensure that the publication by Springer is properly credited and that the relevant copyright notice is repeated verbatim.
Author may self-archive an author-created version of his/her Contribution on his/her own website and/or in his/her institutional repository, as well as on a non-commercial archival repository such as ArXiv/CoRR and HAL, including his/her final version. Author may also deposit this version on his/her funder’s or funder’s designated repository at the funder’s request or as a result of a legal obligation. Author may not use the publisher’s PDF version, which is posted on www.springerlink.com, for the purpose of self-archiving or deposit. Furthermore, Author may only post his/her version provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer’s website. The link should be accompanied by the following text: "The original publication is available at www.springerlink.com". Author retains the right to use his/her Contribution for his/her further scientific career by including the final published paper in his/her dissertation or doctoral thesis provided acknowledgement is given to the original source of publication. Author also retains the right to use, without having to pay a fee and without having to inform the publisher, parts of the Contribution (e.g. illustrations) for inclusion in future work, and to publish a substantially revised version (at least 30% new content) elsewhere, provided that the original Springer Contribution is properly cited.
|Deposited On:||11 Jan 2013 02:06|
|Last Modified:||17 Jan 2013 18:18|
Repository Staff Only: item control page