PhD position on Energy-aware job scheduling and feedback

Georges Da Costa and Patricia Stolf are looking for there next PhD student.

 

 

 

Contexte

Launched in 2023 for a duration of 6 years, The NumPEx PEPR aims to contribute to the design and development of numerical methods and software components that will equip future European Exascale and post-Exascale machines. NumPEx also aims to support scientific and industrial applications in fully exploiting their potentials.

The ExaSoft project aims at consolidating the European Exascale software ecosystem by providing a coherent, exascale-ready software stack enabling HPC applications to efficiently exploit  heterogeneous supercomputers featuring heavily accelerated compute nodes. The project will achieve breakthrough research advances in programming languages and models, code optimization, runtime systems, performance profiling and analysis, and numerical libraries to address major scientific challenges.

Mission

High Performance Computing usage is growing from climate science studies to chemical research. The increased impact of these computation opens the field of research on how to manage and reduce their energy consumption. The PhD is in the context of the NumPEx project which aims at developing state-of-the-art skills and infrastructures in the field of exascale computing. One of the pillars of NumPEx focuses on making exascale computing sustainable.

HPC (High Performance Computing) systems are usually managed by RJMS (Resource and Job Management System) that decide when and on which server to execute the applications submitted by users. This RJMS is crutial as its quality directly impacts the performance of the whole infrastructure. Usually this performance is measured by the number of tasks finished each day or the execution time. In our context we will also optimize the CO2 impact of these tasks (scheduling tasks when the carbon footprint is minimal for example). We will also leverage the capability of applications to run using different number of servers to optimize our metrics.

HPC infrastructures are exascale supercomputers with monitoring of resource usage, energy consumed.

HPC applications are usually described by a DAG (Directed Acyclic Graph) of tasks. Another leverage is that different tasks have different power and resource impact. Running applications consuming less power when energy produces a lot of CO2 is more efficient.

Users feedbacks (metrics showing impacts of the execution demand) are more and more required to reach sustainability of these platforms.

The PhD student will focus on studying energy-aware scheduling of HPC applications.

Principales activités

The objectives of the PhD are the following:

  • Multi-objectives Tasks Scheduling (energy, time) with as inputs a prediction of energy mix, renewable energy available, energy price
  • Using IA prediction for the consumption of application. These predictions will be used by the scheduling algorithms to optimize the efficiency of the platform (cf. https://hal.science/hal-04566184/document)
  • Evaluate the impact of different leverages (capacity to adapt the resources required by applications like in https://hal.science/hal-02964970, heterogeneity of the different applications requirements, heterogenity of resources (CPU/GPU)…)
  • Propose different users feedback like in https://www.sciencedirect.com/science/article/pii/S0167739X24000219
  • Experiments using actual HPC infrastructure (using OAR Resources and Job Management System, RJMS)

The PhD structure will be as follows:

  • State of the art on HPC scheduling
  • Propose Multi-objectives Tasks Scheduling (energy, time)
  • Experiments to validate the algorithms and compare them with state of the art
  • A demonstrator using OAR and the scheduling algorithms proposed to reduce the power consumption of HPC platforms

Monitoring software will be used (such as MojitO/S) during the PhD, and some contributions might be done to them. A large scale experiment platform will be used (Grid’5000).

At the end of the PhD, the student will have acquired the following skills: scientific work and experimentations, expertise in HPC and sustainable computing, planning of long term projects, scientific writing.

If willing, the student will have the opportunity to teach in English or French.

Compétences requises

Candidates must have a Master’s degree in Computer Science. A taste

Applied Mathematics, or other relevant fields. A PhD degree and working experience in a relevant domain are appreciated. Good programming skills are required. A taste for experimental approaches, C or Rust programming, Python or R data analysis is strongly recommended.

Language: English. Basic French is a plus.