NumPEx BOF@SC23
The International Conference for High Performance Computing, Networking, Storage, and Analysis SuperComputing 2023 will take place in Denver from November 12 to 17, 2023.
During this conference, a birds of a Feather (BoF) connected to the NumPEx program is scheduled, allowing conference attendees to openly discuss current topics of interest to the HPC community.
Several (trans-)national initiatives have recognized the crucial importance of co-design between hardware, software and application stakeholders in the path to Exascale and beyond. It is seen as indispensable for the efficient exploitation of Exascale computing resources in the development of large-scale application demonstrators but also to prepare complex applications to fully exploit the full capacity of Exascale and post-Exascale sytems. Among these projects, we can cite the French NumPEX project (41M€) but also the EuroHPC program, or the ECP project in the US and FugakuNEXT project in Japan. However, these efforts are somehow disconnected while the community would benefit by sharing return of experience, common know-how, and advancing on the new problems arising as exascale machines become more and more available.
Building on earlier efforts of the International Exascale Software Project (IESP), the European EXtreme Data and Computing Initiative (EXDCI), and the BDEC community, we will work on the implementation of an international, shared, high quality computing environment that focuses on co-design principles and practices. US, European and Japanese partners have already met and identified a set of areas for international coordination (among others):
- Software production and management: packaging, documentation, builds, results, catalogs, continuous integration, containerization, LLVM, parallel tools, etc.
- Software sustainability
- Future and disruptive SW&HW technologies and usages (investments and roadmaps)
- Mapping of missing capabilities (driven by both Apps and SW)
- Roadmap of near-term HW targets
- HPC/AI convergence: ML, open models and datasets for AI training,
- FAIR data stewardship
- Digital Continuum and Data management
- Benchmarks and evaluation, co-design (HW, SW, applications)
- Energy and environmental impact and sustainability
- Collaboration/Partnership factory: establish collaborations at international level
- Training
The proposed BOF will offer an overview of the different Exascale programs and initiatives from the perspective of co-design (mixing application, software-stack and hardware perspective). Then, international partners representing major computing centers from Europe, USA and Japan, will expose and discuss the common issues and questions.
BoF leaders will seek feedback from attendees on the objectives of co-design and the efficient exploitation and coordination of existing efforts in Europe, the US and Japan, dedicated to exascale. BOF participants will be invited to share their views on the above problems and issues. We will also discuss candidates for collaborative application demonstrators, of which the LHC project of CERN, IPCC climate models, and the SKA project are some representative examples. Finally, we will call for contributions from participants.
At the BoF, a panel of experts from NumPEx, ECP, Riken-CC, BSC, JSC and from selected application communities, will raise issues and solicit input in each of their respective areas. Representatives of all the involved funding agencies will be invited to participate and contribute.
The ultimate goal of this BOF is to launch a new series of workshops dedicated to international collaborations between Europe, USA and Japan on Exascale and post-Exascale computing.
NumPEx Launches into Action with an Ambitious Kick-Off Agenda in Perros-Guirrec
In a series of dynamic sessions hosted from June 26th to 28th in the charming town of Perros-Guirrec, NumPEx embarked on an intensive kick-off event, setting the stage for a transformative journey in Exascale computing. Leaders, experts, and collaborators convened to delve into an agenda rich with insights,workshops, and collaborative initiatives.
The kick-off began with a comprehensive introduction, outlining the objectives and significance of the NumPEx program, aiming to establish a common vision and foster collaboration to implement a coherent software stack and related processes by 2025, benefiting not only France but also Europe, in preparation for the Exascale machine. Key figures such as Jerome Bobin, Michel Dayde, and Jean-Yves Berthou elaborated on the program's goals and organizational structure. Board members shared their perspectives on the Exascale vision and roadmaps:
GENCI's Exascale Vision and Roadmap:
- Presentation of GENCI's role and missions, including hosting the Exascale project for EuroHPC.
- European HPC initiative partnership with EuroHPC and others, leveraging PRACE and GEANT.
- Introduction of the Jules Verne consortium, highlighting international and industrial partnerships.
- Vision of the European Exascale machine: addressing societal challenges, fostering innovation, and emphasizing HPC/IA data-centric convergence.
- Collaboration plans with NumPEx, including building a functional program, benchmark development, and product promotion.
Eviden Exascale Vision and Roadmap:
- Eviden's complex approach involving HPC, HPDA, IA, and quantum technologies with a focus on sovereign and European components.
- Involvement in the European integrated processor for Exascale machines (SiPearl) and collaborations with various technology projects.
- Collaboration with CEPP for application support and participation in technology projects related to Exascale, quantum, cloud, and more.
National and European Ecosystem:
- Introduction of EUPEX, a 4-year project with a budget similar to NumPEx, aiming to deploy a modular Exascale system using the OpenSequana architecture.
- Collaboration with NumPEx, potential for shared experiments and results, and exploration of common dissemination.
- Presentation of Data Direct Network (DDN) with a focus on AI and Lustre parallel file system, highlighting challenges and the importance of understanding NumPEx applications.
The afternoon continued with a tour of the five projects (PCs) within the NumPEx program:
- Exa-MA, which aims to design scalable algorithms and numerical methods for forthcoming exascale machines. Led by Christophe Prudhomme (Université de Strasbourg) and Helene Barucq (Inria).
- Exa-Soft, to develop a coherent, portable, efficient, and resilient software stack for exascale. Led by Raymond Namyst (Inria) and Alfredo Buttari (CNRS - Centre national de la recherche scientifique).
- Exa-DoST, to overcome challenges relating to data, notably storage, I/O, in situ processing, and smart analytics, in exascale supercomputers. Led by Gabriel Antoniu (Inria) and Julien Bigot (CEA).
- Exa-ATOW, to deal with large-scale workflows involving exascale machines. Led by François Bodin (Université de Rennes), Mark Asch (Université de Picardie Jules Verne (UPJV)), and Thierry Deutsch (CEA).
- Exa-DI, to ensure transverse co-design and software productivity for exascale supercomputers. Led by Jean-Pierre Vilotte (CNRS) and Valérie Brenner (CEA).
The day concluded with an emphasis on the collaborative efforts between NumPEx and other initiatives, with a focus on benchmark development, software-hardware links, and the overall goal of preparing for the challenges of the Exascale era.
The second day kicked off with an invigorating early morning jog along the seashore, setting a vibrant tone for a day filled with thematic workshops. Participants engaged in focused discussions on energy synergies, GPU integration, applications, co-design, gender/diversity/equity, software production integration, training, resilience, international collaborations, and artificial intelligence. Thematic workshops, led by domain experts, fostered collaboration within smaller groups, emphasizing the program's commitment to a transverse approach to Exascale challenges.
The final day commenced with a synthesis of workshop outcomes, highlighting the depth of discussions within each thematic area. Workshop leaders consolidated insights, offering a panoramic view of challenges and opportunities. Here is an overview of the key insights and strategic actions discussed during these workshops:
GPU Accelerators Workshop
In a dedicated workshop on GPU Accelerators, experts emphasized the pivotal role of Graphics Processing Units (GPUs) in achieving exascale computing. With 90-99% of large machine performance attributed to GPU acceleration, the workshop highlighted the need for applications to explore the potential of these powerful processors. Challenges discussed included new programming paradigms, code portability, data management, and the hardware landscape driven by gaming and artificial intelligence. The workshop outlined a comprehensive plan, including future workshops, analysis papers, tutorials, hackathons, and examples of successfully ported mini-apps.
Energy Workshop
The Energy Workshop focused on achieving Exascale computing within a power consumption limit of 20MW. Experts delved into environmental, scientific, technical, and societal dimensions, providing a roadmap for the HPC community. Key challenges identified included modeling system consumption, real-time measurement tools, resource prioritization based on societal impact, and the broader environmental impact of research activities. The action plan involves developing a performance and consumption model, optimization strategies, tools for users, and fostering links with external entities to incorporate energy considerations.
Gender Equity and Diversity Seminar
The action plan includes the establishment of a Code of Conduct, assessment of gender distribution, creation of a web platform for resources, education and training initiatives, awareness and outreach programs, and dedication to accessibility and recognition. NumPEx aims to create an inclusive and collaborative future, inviting all stakeholders to contribute to the initiatives.
AI Workshop
The AI Workshop explored the critical intersection of HPC and AI, addressing challenges and outlining a strategic plan for collaborative exploration. Key discussions included decision support tools for AI applications in HPC, optimizing runtimes for AI models, and converging HPC and AI usages. The action plan involves establishing an AI Working Group, conducting transversal workshops, and developing fundamental building blocks for a convergent future.
Training Strategies Workshop
The Training Strategies Workshop addressed the complexities of training in the context of the impending exascale era. Discussions included the scope and subjects of training programs, the creation of sustainable training models, and economic considerations in training initiatives. The workshop emphasized collaborative and inclusive training initiatives to prepare the scientific community for the challenges and opportunities of exascale computing.
International Collaborations Workshop
The International Collaborations Workshop focused on identifying challenges and setting objectives for enhanced collaborative frameworks on a European and global scale. Discussions covered scientific and technological challenges, the design and development of the exascale software stack, and strategic action plans. The outlined roadmap includes hosting workshops, exchanging insights and experiences, and strengthening collaborations with international entities.
National Centers Integration Workshop
The National Centers Integration Workshop aimed to align NumPEx with HPC infrastructures, emphasizing operational elements between computing centers and NumPEx 's targeted projects. Discussions covered operational assessment, cybersecurity, job profiling, and traceability. The workshop set a plan for regular video conferences, ensuring ongoing communication and collaboration.
Software Production Workshop
The Software Production Workshop focused on streamlining software development practices in the HPC domain. Challenges discussed included bridging silos, enforcing good practices, and amplifying impact. Insights and conclusions highlighted diverse development practices, sustainability models, and the deployment of continuous integration and certification. NumPEx 's commitment to advancing software production practices aims to foster innovation, collaboration, and sustainable development in HPC.
Exascale Resilience Workshop
The Exascale Resilience Workshop navigated complexities associated with exascale application deployment. Discussions covered diverse approaches across NumPEx PCs, key challenges, and strategic choices. The action plan involves listing and analyzing application needs, analyzing barriers to library adoption, and scrutinizing international solutions. NumPEx aims to foster collaborative solutions for enhanced application resilience at a global scale.
Applications and Co-Design Workshop
The Applications and Co-Design Workshop promoted co-development strategies for advanced application development. Discussions included challenges in co-design, key questions for collective exploration, building connections, and initiatives for sustainability. The workshop set the stage for upcoming co-development project workshops, emphasizing collaboration and innovation.