Distributed computing of all-to-all comparison problems in heterogeneous systems

Zhang, Yi-Fan, Tian, Yu-Chu, Kelly, Wayne, & Fidge, Colin J. (2015) Distributed computing of all-to-all comparison problems in heterogeneous systems. In Proceedings of the 41st Annual Conference of the IEEE Industrial Electronics Society, IEEE, Yokohama, Japan. (In Press)

[img] PDF (303kB)
Administrators only | Request a copy from author

View at publisher

Abstract

The requirement of distributed computing of all-to-all comparison (ATAC) problems in heterogeneous systems is increasingly important in various domains. Though Hadoop-based solutions are widely used, they are inefficient for the ATAC pattern, which is fundamentally different from the MapReduce pattern for which Hadoop is designed. They exhibit poor data locality and unbalanced allocation of comparison tasks, particularly in heterogeneous systems. The results in massive data movement at runtime and ineffective utilization of computing resources, affecting the overall computing performance significantly. To address these problems, a scalable and efficient data and task distribution strategy is presented in this paper for processing large-scale ATAC problems in heterogeneous systems. It not only saves storage space but also achieves load balancing and good data locality for all comparison tasks. Experiments of bioinformatics examples show that about 89\% of the ideal performance capacity of the multiple machines have be achieved through using the approach presented in this paper.

Impact and interest:

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 87374
Item Type: Conference Paper
Refereed: Yes
Keywords: Big data, distributed computing, all-to-all comparison, data distribution
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > DISTRIBUTED COMPUTING (080500) > Distributed Computing not elsewhere classified (080599)
Divisions: Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2015 IEEE
Deposited On: 09 Sep 2015 00:25
Last Modified: 22 Mar 2016 00:58

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page