HPC/HPDA research engineer positions

Application should be send to Yushan Wang and Benoît Martin.

 

 

Context

Launched in 2023 for a duration of 6 years, The NumPEx PEPR aims to contribute to the design and development of numerical methods and software components that will equip future European Exascale and post-Exascale machines. NumPEx also aims to support scientific and industrial applications in fully exploiting their potentials.

Exa-DoST will address the major data challenges by proposing operational solutions co-designed and validated in French and European applications. This will allow filling the gap left by previous international projects to ensure that French and European needs are taken into account in the roadmaps for building the data-oriented Exascale software stack.

Mission

With the increasing complexity of numerical simulation codes, new approaches are required to analyze the ever-growing amount of data. This requires coupling up-to-date data analysis libraries with the existing highly optimized numerical simulation codes. The PDI Data Interface code coupling library is designed to fulfill this goal.

The open-source PDI Data Interface library is designed and developed for process-local loose coupling in high-performance simulation codes. PDI supports the modularization of codes by inter-mediating data exchange between the main simulation code and independent modules (plugins) based on various libraries. It is developed in modern C++ and offers C, Fortran, and Python application programming interfaces.

PDI offers a reference system similar to Python or C++’s shared_ptr with locking to ensure coherent access by coupled modules. It provides a global namespace (the data store) to share references and implements the Observer pattern to enable modules to react to data availability and modifications. It implements a metadata system that can specify a dynamic type for references based on the value of other data (e.g., array size based on the value of a shared integer). Codes using PDI’s declarative API expose the buffers in which they store data and trigger notifications when significant steps in the simulation are reached. Third-party libraries such as HDF5, SIONlib, or FTI are wrapped in a PDI plugin. A YAML configuration file is used to interleave plugins and additional code without modifying the original application.

Another aspect we explore with PDI is in-situ data analysis, which performs numerical analytics during the simulation. This is necessary due to the ever-growing gap between file system bandwidth and compute capacities. To this end, we are developing the Deisa plugin. This plugin is based on the open-source Dask framework and allows us to transfer data to dedicated processes to perform in-situ analysis.

One of our goals is to establish a feedback mechanism between the in-situ data analysis and the numerical simulation. This allows better resource allocations and on-the-fly simulation monitoring. Another aspect that in-situ analysis enables is using AI methods for HPC and HPDA. For instance, we can have unsupervised detection of rare events during the simulation, which can greatly reduce the amount of produced data, thus reducing stress on the file system.

Main activities

As a member of the newly created PDI team, your primary focus will be developing and maintaining the PDI library.

  • Develop core functionalities and new plugins for PDI
  • Develop the Deisa library
  • User-support
  • Organize training sessions
  • Library packaging and deployment

The successful candidates will master the following skills and knowledge:

  • Proficiency in modern C++ (C++14 and above)
  • Software engineering and library design
  • Modern development environment (Linux, git, CMake, etc.)
  • Communication (writing, presenting, and training)
  • Team-work and integration in an international environment

In addition, the following will be considered a plus:

  • Data analysis libraries such as Dask
  • Knowledge and experience with Python, Fortran and/or GPU computing
  • HPC and parallel libraries such as OpenMP and MPI
  • HPC parallel IO libraries such as HDF5 or NetCDF
  • Experience with supercomputers tools (slurm, sbatch, etc.), packaging and deployment

Required Skills

Candidates must have at least a Master’s degree or equivalent in Computer Science, Applied Mathematics, or other relevant fields. A PhD degree and working experience in a relevant domain are appreciated. Good programming skills are required.

More informations

For further information, please contact Yushan Wang ([email protected]) and Benoît Martin ([email protected])