Back to overview

Experimental Verification and Analysis of Dynamic Loop Scheduling in Scientific Applications

Type of publication Peer-reviewed
Publikationsform Proceedings (peer-reviewed)
Author Mohammed Ali, Eleliemy Ahmed, Ciorba Florina M., Kasielke Franziska, Banicescu Ioana,
Project Multilevel Scheduling in Large Scale High Performance Computers
Show all

Proceedings (peer-reviewed)

Title of proceedings The International Symposium on Parallel and Distributed Computing (ISPDC2018)
Place Geneva, Switzerland

Open Access

Type of Open Access Repository (Green Open Access)


Scientific applications are often irregular and characterized by large computationally-intensive parallel loops. Dynamic loop scheduling (DLS) techniques can be used to improve the performance of computationally-intensive scientific applications via load balancing of their execution on high-performance computing (HPC) systems. Identifying the most suitable choices of data distribution strategies, system sizes, and DLS techniques which improve the performance of a given application, requires intensive assessment and a large number of exploratory native experiments (using real applications on real systems), which may not always be feasible or practical due to associated time and costs. In such cases, to avoid the execution of a large volume of exploratory native experiments, simulative experiments which are faster and less costly are more appropriate for studying the performance of applications for the purpose of optimizing it. This motivates the question of ‘How realistic are the simulations of executions of scientific applications using DLS on HPC platforms?’ In the present work, a methodology is devised to answer this question It involves the experimental verification and analysis of the performance of DLS in scientific applications. The proposed methodology is employed for a computer vision application executing using four DLS techniques on two different HPC platforms, both via native and simulative experiments. Moreover, the evaluation and the analysis of the native and simulative results indicate that the accuracy of the simulative experiments is strongly influenced by the values obtained by the chosen approach used to extract the computational effort of the application (FLOP- or time-based), the choice of application model representation into simulation (data or task parallel), and the choice of HPC subsystem models available in the simulator (multi-core CPUs, memory hierarchy, and network topology). Further insights into the native performance on two HPC platforms one versus the other, the simulated performance using the two SimGrid interfaces, one versus the other, and the native versus the simulated performance for each of the simulated HPC platforms, are also presented and discussed. The minimum and the maximum percent errors achieved between native and simulative experiments are 0.95% and 8.03%, respectively.