# Data Transfer Energy is all that matters in post-Exa machines

Satoshi Matsuoka, Director Riken R-CCS SC23 InPEX BOF 2023/11/15



a

. 0







(To be officially announced Nov. 1, 2023)

Catalyst



# **Overview of Study (System Research by RIKEN)**



#### **Project Overview**

The next-generation computational infrastructure is expected to become a platform for realizing SDGs and Society 5.0 by **providing advanced digital twins** that will bring "Research DX" in the science. Aiming to realize a versatile computing infrastructure that can execute entire workflow by making full use of wide range of computational methods, simulation techniques, and BigData at scale, we conduct a holistic investigation on architecture, system software and library technologies through co-design with applications.

As a basic principle of system design, we **practice the "FLOPS to Byte" concept** from architecture development to algorithm or application design to streamline data transfer and computation under power constraints, while taking necessary computing accuracy into consideration. Under the ALL JAPAN team composition, we will investigate system configurations and elementary technologies which improve effective performance of the next-generation computing infrastructure.

#### **Subject of Investigation**

#### **Research on Architecture**

- Investigating technological possibilities (such as 3D stacked mem, accelerators, chip-to-chip direct optical link) and performance of the entire system or its components based on trends in semiconductor and packaging technologies
- Predicting future system performance based on performance analysis of benchmark sets provided by Application Research Group, and feeding back to next-generation application development

#### **Research on System Software and Library**

• Drawing roadmap for future system software development in Japan, specially considering data utilization enhancement, integration of AI technology with first-principles simulation, real-time data processing, and assurance of high security

#### **Research on Applications**

- Building a broad benchmark set to evaluate multiple architecture choices while considering improvements in algorithms and parameters of application based on the results of architectural evaluations and exploratory "what-if" performance analysis
- Investigating what classes of algorithms are expected to evolve significantly for future systems

#### **Investigation Schedule**







Strawman processing element architecture









# **Organization Chart of System Research by RIKEN**



# **Key Research Item for Node Architecture Selection**

- Needs for a power-efficient compute node
  → Exploration of accelerators
  - Truly useful accelerator for HPC and AI workloads
  - HPC & AI Inference→Memory bound, AI Training→Compute bound
- Characteristics of current processing element
  - CPU: high generality, low-latency, low compute density
  - GPU (SP): vector processing, middle compute density
  - Matrix: dedicated for dense algebra, high compute density (ex. Tensor core, XMM, SME, AMX, TPU, CGRA, ...)

## • What to study in node architecture exploration

- What and how to integrate them
- Effective memory bandwidth + data movement with high programming productivity





Need to find the optimal balance

# **Performance Projection in Power Constrained Scenarios**

### • Estimated energy per operation on current and future technologies

- Based on historical trend obtained by publically available data
- Not related to any partner vendors' perspective
- Case for 30MW power budget (10MW for memory and 20MW for compute)
  - Network is omitted for simplicity but it is very important
  - May not be realistic due to other constraint such as cost and thermal issues



# **Implementation Approaches for Node Architectures**

## • Candidates of packaging technologies



# Next Steps in the Feasibility Study Project

- Selecting architecture/system candidates for a next-generation system
  - Accelerator, memory technology, photonics technology, and packaging
  - Consider effective accelerator architecture based on quantitative benchmarking analyses
  - Optimizing balance or fusion between HPC and AI performance

## Creating R&D roadmap for system software

- Being strongly conscious of software ecosystem
- Optimized workflow execution specially for HPC and AI cooperation

## • Application first system design

- Design a system target for science breakthrough NOT just for ranking such as Top500
- Building benchmark framework for fair architectural comparison
- Blushing up future science roadmap including roadmap on "AI for Science"

## • Collaborating operation technique and new computing-paradigm teams

- Data framework, realtimeness, carbon neutrality, • •
- Extending computable areas by HPC-Quantum hybrid platforms