The successful candidate will join the NumPEx Exa-DI project.

If you're ready to take on the challenge, don't hesitate to apply!

 

 

Context

ASTRA is a 4-year project selected within a competitive call for projects of the French NumPEx (Numérique pour l’Exascale) PEPR research program. Focused on radio astronomy in the context of next-generation instruments such as LOFAR2.0 and SKA, it aims to design and deploy a unified, container-based orchestration and data management platform federating HPC, cloud, and distributed storage resources to run large-scale, reproducible workflows across heterogeneous infrastructures, forming the backbone of the future French SKA Regional Centre (FR-SRC). Four representative use-cases will be targeted for demonstration: detection of gravitational waves with pulsar timing arrays, detection of the redshifted 21-cm signal from the Cosmic Dawn, mapping of galactic atomic gas, and wide-field continuum surveys.

To support this project, NumPEx is seeking an expert in integration, packaging, and CI/CD to lead the definition and implementation of a robust deployment and change-management framework for large-scale radio astronomy workflows deployed across HPC and cloud infrastructures.

Although the successful candidate will work specifically on the ASTRA project, they will be organizationally embedded within the broader NumPEx program, and more particularly within the Exa-DI team. Exa-DI works in close interaction with application communities to identify key algorithmic and communication patterns in exascale applications, develop representative mini-applications based on the NumPEx software stack, and evaluate their performance and portability on large-scale HPC systems.

This environment provides direct access to high-level expertise in exascale computing, performance analysis, co-design methodologies, CI/CD for HPC, and large-scale benchmarking. The position therefore offers both the autonomy of leading integration and deployment efforts within ASTRA and the support of a national-level ecosystem dedicated to performance, scalability, and software engineering excellence for scientific computing.

Mission

The successful candidate will design and enforce a shared integration methodology among all ASTRA contributors, replacing fragmented & heterogenous integration environments with unified, reproducible, container-based execution models and clearly defined release, versioning, and dependency management practices. They will act as the technical lead for deployment and evolution management of the system (workflows + underlying infrastructure), ensuring controlled change processes, cross-team alignment, and system-wide reproducibility.

Key responsibilities include:

  • Designing reproducible packaging and containerization strategies (Docker/Singularity/OCI) for hybrid HPC–cloud environments
  • Defining and implementing state-of-the-art CI/CD pipelines with automated workflow testing, validation, and performance benchmarking
  • Establishing integration guidelines, versioning policies, and change-management procedures
  • Coordinating releases and ensuring compatibility across multiple contributing teams
  • Training and aligning contributors on best practices in DevOps, testing, and managed system evolution

The role combines technical leadership, DevOps expertise, and cross-team coordination to ensure a sustainable, scalable, and production-grade workflow ecosystem.

Required Skills

  • Master’s degree, engineering degree, or PhD in computer science or another field related to scientific computing
  • Strong experience in DevOps and CI/CD design, including automated testing, release management, and versioning strategies (Git-based workflows, GitLab CI or equivalent).
  • Proven expertise in containerization and reproducible deployments: Docker, Singularity/Apptainer, Guix/Spack package managers and hybrid HPC–cloud environments.
  • Experience with orchestration platforms (e.g., Kubernetes) and service-oriented / microservice architectures.
  • Solid understanding of HPC environments (batch schedulers such as SLURM, shared filesystems, GPU nodes, performance constraints).
  • Ability to define and enforce integration guidelines, dependency management, and controlled change processes across multiple teams.
  • Experience implementing automated workflow validation, regression testing, and performance benchmarking.
  • Familiarity with distributed storage systems (including object storage), infrastructure heterogeneity, and secure access management.
  • Strong system-level thinking and architectural vision
  • Ability to align contributors toward shared engineering practices
  • Clear communication and training capabilities
  • Experience working in collaborative, multi-institutional environments
Banniere logo CEA CNRS INRIA FR 2030

General Information

Privacy Preference Center