Scalable and efficient data distribution for distributed computing of all-to-all comparison problems
Zhang, Yi-Fan, Tian, Yu-Chu, Kelly, Wayne, & Fidge, Colin (2016) Scalable and efficient data distribution for distributed computing of all-to-all comparison problems. Future Generation Computer Systems. (In Press)
Administrators only | Request a copy from author
Available under License Creative Commons Attribution Non-commercial No Derivatives 4.0.
All-to-all comparison problems represent a class of big data processing problems widely found in many application domains. To achieve high performance for distributed computing of such problems, storage usage, data locality and load balancing should be considered during the data distribution phase in the distributed environment. Existing data distribution strategies, such as the Hadoop one, are designed for problems with MapReduce pattern and do not consider comparison tasks at all. As a result, a huge amount of data must be re-arranged at runtime when the comparison tasks are executed, degrading the overall computing performance significantly. Addressing this problem, a scalable and efficient data distribution strategy is presented in this paper with comparison tasks in mind for distributed computing of all-to-all comparison problems. Specifically designed for problems with all-to-all comparison pattern, it not only saves storage space and data distribution time but also achieves load balancing and good data locality for all comparison tasks of the all-to-all comparison problems. Experiments are conducted to demonstrate the presented approaches. It is shown that about 90% of the ideal performance capacity of the multiple machines can be achieved through using the approach presented in this paper.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
|Item Type:||Journal Article|
|Keywords:||Distributed computing, big data, all-to-all comparison, data distribution|
|Subjects:||Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > COMPUTATION THEORY AND MATHEMATICS (080200) > Computation Theory and Mathematics not elsewhere classified (080299)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > DISTRIBUTED COMPUTING (080500) > Distributed and Grid Systems (080501)
|Divisions:||Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
|Copyright Owner:||Copyright 2016 Elsevier B.V.|
|Deposited On:||28 Aug 2016 22:51|
|Last Modified:||29 Aug 2016 22:59|
Repository Staff Only: item control page