Publication

Back to overview

LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications

Type of publication Peer-reviewed
Publikationsform Original article (peer-reviewed)
Author Müller Korndörfer Jonas Henrique, Eleliemy Ahmed, Mohammed Ali, Ciorba Florina M.,
Project Multilevel Scheduling in Large Scale High Performance Computers
Show all

Original article (peer-reviewed)

Journal Transactions on Parallel and Distributed Systems (TPDS2021))
Title of proceedings Transactions on Parallel and Distributed Systems (TPDS2021))

Open Access

URL https://arxiv.org/abs/2106.05108
Type of Open Access Repository (Green Open Access)

Abstract

Exascale computing systems will exhibit high degrees of hierarchical parallelism, with thousands of computing nodes and hundreds of cores per node. Efficiently exploiting hierarchical parallelism is challenging due to load imbalance that arises at multiple levels. OpenMP is the most widely-used standard for expressing and exploiting the ever-increasing node-level parallelism. The scheduling options in OpenMP are insufficient to address the load imbalance that arises during the execution of multithreaded applications. The limited scheduling options in OpenMP hinder research on novel scheduling techniques which require comparison with others from the literature. This work introduces LB4OMP, an open-source dynamic load balancing library that implements successful scheduling algorithms from the literature. LB4OMP is a research infrastructure designed to spur and support present and future scheduling research, for the benefit of multithreaded applications performance. Through an extensive performance analysis campaign, we assess the effectiveness and demystify the performance of all loop scheduling techniques in the library. We show that, for numerous applications-systems pairs, the scheduling techniques in LB4OMP outperform the scheduling options in OpenMP. Node-level load balancing using LB4OMP leads to reduced cross-node load imbalance and to improved MPI+OpenMP applications performance, which is critical for Exascale computing.
-