Data-aware task scheduling for all-to-all comparison problems in heterogeneous distributed systems

Zhang, Yi-Fan, Tian, Yu-Chu, Fidge, Colin, & Kelly, Wayne (2016) Data-aware task scheduling for all-to-all comparison problems in heterogeneous distributed systems. Journal of Parallel and Distributed Computing, 93-94, pp. 87-101.

[img] Accepted Version (PDF 1MB)
Administrators only until July 2018 | Request a copy from author
Available under License Creative Commons Attribution Non-commercial No Derivatives 4.0.

View at publisher

Abstract

Solving large-scale all-to-all comparison problems using distributed computing is increasingly significant for various applications. Previous efforts to implement distributed all-to-all comparison frameworks have treated the two phases of data distribution and comparison task scheduling separately. This leads to high storage demands as well as poor data locality for the comparison tasks, thus creating a need to redistribute the data at runtime. Furthermore, most previous methods have been developed for homogeneous computing environments, so their overall performance is degraded even further when they are used in heterogeneous distributed systems. To tackle these challenges, this paper presents a data-aware task scheduling approach for solving all-to-all comparison problems in heterogeneous distributed systems. The approach formulates the requirements for data distribution and comparison task scheduling simultaneously as a constrained optimization problem. Then, metaheuristic data pre-scheduling and dynamic task scheduling strategies are developed along with an algorithmic implementation to solve the problem. The approach provides perfect data locality for all comparison tasks, avoiding rearrangement of data at runtime. It achieves load balancing among heterogeneous computing nodes, thus enhancing the overall computation time. It also reduces data storage requirements across the network. The effectiveness of the approach is demonstrated through experimental studies.

Impact and interest:

0 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 94975
Item Type: Journal Article
Refereed: Yes
Keywords: Distributed computing, all-to-all comparison, data distribution, task scheduling, big data
DOI: 10.1016/j.jpdc.2016.04.008
ISSN: 0743-7315
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > DISTRIBUTED COMPUTING (080500) > Distributed and Grid Systems (080501)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > DISTRIBUTED COMPUTING (080500) > Distributed Computing not elsewhere classified (080599)
Divisions: Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2016 Elsevier
Copyright Statement: This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Deposited On: 20 Apr 2016 00:32
Last Modified: 23 Aug 2016 04:38

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page