Data-aware task scheduling for all-to-all comparison problems in heterogeneous distributed systems
Zhang, Yi-Fan, Tian, Yu-Chu, Fidge, Colin, & Kelly, Wayne (2016) Data-aware task scheduling for all-to-all comparison problems in heterogeneous distributed systems. Journal of Parallel and Distributed Computing, 93-94, pp. 87-101.
Administrators only until July 2018 | Request a copy from author
Available under License Creative Commons Attribution Non-commercial No Derivatives 4.0.
Solving large-scale all-to-all comparison problems using distributed computing is increasingly significant for various applications. Previous efforts to implement distributed all-to-all comparison frameworks have treated the two phases of data distribution and comparison task scheduling separately. This leads to high storage demands as well as poor data locality for the comparison tasks, thus creating a need to redistribute the data at runtime. Furthermore, most previous methods have been developed for homogeneous computing environments, so their overall performance is degraded even further when they are used in heterogeneous distributed systems. To tackle these challenges, this paper presents a data-aware task scheduling approach for solving all-to-all comparison problems in heterogeneous distributed systems. The approach formulates the requirements for data distribution and comparison task scheduling simultaneously as a constrained optimization problem. Then, metaheuristic data pre-scheduling and dynamic task scheduling strategies are developed along with an algorithmic implementation to solve the problem. The approach provides perfect data locality for all comparison tasks, avoiding rearrangement of data at runtime. It achieves load balancing among heterogeneous computing nodes, thus enhancing the overall computation time. It also reduces data storage requirements across the network. The effectiveness of the approach is demonstrated through experimental studies.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
|Item Type:||Journal Article|
|Keywords:||Distributed computing, all-to-all comparison, data distribution, task scheduling, big data|
|Subjects:||Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > DISTRIBUTED COMPUTING (080500) > Distributed and Grid Systems (080501)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > DISTRIBUTED COMPUTING (080500) > Distributed Computing not elsewhere classified (080599)
|Divisions:||Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
|Copyright Owner:||Copyright 2016 Elsevier|
|Copyright Statement:||This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/|
|Deposited On:||20 Apr 2016 00:32|
|Last Modified:||23 Aug 2016 04:38|
Repository Staff Only: item control page