Publication

Back to overview

Two-level Dynamic Load Balancing for High Performance Scientific Applications

Type of publication Peer-reviewed
Publikationsform Proceedings (peer-reviewed)
Author Mohammed Ali, Cavelan Aurélien, Ciorba Florina M., Cabezón Rubén M., Banicescu Ioana,
Project Multilevel Scheduling in Large Scale High Performance Computers
Show all

Proceedings (peer-reviewed)

Editor , Meier Yang Ulrike; , Biros George
ISBN 978-1-61197-613-7
Title of proceedings Proceedings of the 2020 SIAM Conference on Parallel Processing for Scientific Computing
DOI 10.1137/1.9781611976137.7

Open Access

URL https://arxiv.org/pdf/1911.06714.pdf
Type of Open Access Repository (Green Open Access)

Abstract

Scientific applications are often complex, irregular, and computationally-intensive. To accommodate the ever-increasing computational demands of scientific applications, high performance computing (HPC) systems have become larger and more complex, offering parallelism at multiple levels (e.g., nodes, cores per node, threads per core). Scientific applications need to exploit all the available multilevel hardware parallelism to harness the available computational power. The performance of applications executing on such HPC systems may adversely be affected by load imbalance at multiple levels, caused by problem, algorithmic, and systemic characteristics. Nevertheless, most existing load balancing methods do not simultaneously address load imbalance at multiple levels. This work investigates the impact of load imbalance on the performance of three scientific applications at the thread and process levels. We jointly apply and evaluate selected dynamic loop self-scheduling (DLS) techniques to both levels. Specifically, we employ the extended LaPeSD OpenMP runtime library at the thread level, and extend the DLS4LB MPI-based dynamic load balancing library at the process level. This approach is generic and applicable to any multiprocess-multithreaded computationally-intensive application (programmed using MPI and OpenMP). We conduct an exhaustive set of experiments to assess and compare six DLS techniques at the thread level and eleven at the process level. The results show that improved application performance, by up to 21%, can only be achieved by jointly addressing load imbalance at the two levels. We offer insights into the performance of the selected DLS techniques and discuss the interplay of load balancing at the thread level and process level.
-