Content and research objectives
Hardware parallelism ranges from machine instructions to global compute sites. Similarly, software parallelism ranges from scalar instructions to global job queues. Exploiting the available hardware parallelism even at a single level is notoriously challenging. This is partly due to difficulty in exposing and expressing parallelism in applications.
The project will answer the question: Given massive parallelism, at multiple levels and of diverse forms and granularities, how can it be exposed, expressed, and exploited such that execution times are reduced, performance targets are achieved, and acceptable efficiency is maintained?
This project concentrates on scheduling and load balancing.
In this project we propose a multilevel scheduling (MLS) approach for achieving scalable scheduling in large scale high performance computing systems across the multiple levels of parallelism, with a focus on software parallelism.
The MLS approach will leverage all available parallelism and address hardware heterogeneity in large scale high performance computers such that execution times are reduced, performance targets are achieved, and acceptable efficiency is maintained. The methodology for reaching the multilevel scheduling aims involves theoretical research studies, simulation, and experiments.
Scientific and social context of the research project
This project leverages the most efficient existing scheduling solutions to extend them beyond one or two levels, respectively, and to scale them out within single levels of parallelism.
The project aims to make a fundamental advance toward simpler to use large scale high performance computing systems, with impacts not only in the computer science community but also in all computational science domains.