||Swiss National Supercomputing Centre
||Peking University (181.9 Teraflops)
||Tsinghua University (2.8 Teraflops)
|Peking University and University of California, San Diego
|Indy (1st Place):
|Indy (2nd Place):
||Finnish IT Center for Science
|Indy (3rd Place):
||Universidad Nacional de Córdoba
Colorado Convention Center, Denver
The SC23 Student Cluster Competition was an in-person event in Denver, November 13-15, 2023. The competition was chaired by Jenett Tillotson, National Center for Atmospheric Research (NCAR
High-Performance Linpack (HPL)
The HPL benchmark solves a (random) dense linear system in double precision arithmetic. It is often used to measure the peak performance of a computer or that of a high-performance computing (HPC) cluster. The ranking of the top 500 supercomputers in the world is determined by their performances with the HPL benchmark.
HPC Conjugate Gradient (HPCG)
The HPCG benchmark uses a preconditioned conjugate gradient (PCG) algorithm to measure the performance of HPC platforms with respect to frequently observed but challenging patterns of computing, communication, and memory access. While HPL provides an optimistic performance target for applications, HPCG can be considered as a lower bound on performance. Many of the top 500 supercomputers also provide their HPCG performance as a reference.
Machine Learning (ML) is increasingly being used in many scientific domains for making groundbreaking innovations. MLPerf Inference is a benchmark suite for measuring how fast systems can run models in a variety of deployment scenarios. The key motivations behind this benchmark is to measure ML-system performance in an architecture-neutral, representative, and reproducible manner.
The STREAM benchmark is a simple synthetic benchmark that measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels. It is designed to work with datasets that are much larger than the available cache on a processor and is, therefore, indicative of the performance of very large, vector style applications.
The OSU micro-benchmarks consist of a collection of MPI benchmarks that measure the performances of various MPI operations. These are broadly grouped into three benchmark types—i) point-to-point, ii) collective, and iii) one-sided. We will focus on point-to-point MPI benchmarks such as osu_latency and osu_bandwidth.
MPAS (Atmosphere Core)
The Model for Prediction Across Scales—Atmosphere (MPAS-A) is an atmospheric simulation model for use in climate, regional climate, and weather research. MPAS-A supports global and limited-area domains with horizontal resolution from O(100) km down to O(1) km or less, and it employs unstructured meshes known as centroidal Voronoi tessellations (CVTs). The model consists of a dynamical core, which handles the resolved-scale equations of motion, as well as parameterizations of additional physical processes. MPAS-A is developed by the National Center for Atmospheric Research (NCAR), and it shares software infrastructure that was co-developed with the Los Alamos National Laboratory.
Key software characteristics of MPAS-A:
- Runs on hardware as limited as a Raspberry Pi or as powerful as the largest systems on the Top500 list
- Primarily Fortran 2008 code, with some C
- Parallelization with MPI and OpenMP by horizontal domain decomposition
- Support (in a separate code branch) for executing parts of the model on GPUs via OpenACC
3DMHD (Three-Dimensional Magneto Hydro Dynamic)
This is a numerical simulation, written in Fortran with MPI, to study the descent of cold and dense plumes in a stratified layer. Such simulations are important to understanding dynamics of plume development in regards to thermal and magnetic forces inside of stars.
The Reproducibility Challenge is based on SC22 paper "Symmetric Block-Cyclic Distribution: Fewer Communications Leads to Faster Dense Cholesky Factorization". In this paper, the authors are interested in the Cholesky factorization of large dense matrices performed in parallel in a distributed manner. Inspired by recent progress on asymptotic lower bounds on the total number of communications required to perform this operation, they present an original data distribution, Symmetric Block Cyclic (SBC), as an alternative to the standard 2D Block Cyclic (2DBC) distribution implemented in ScaLAPACK. It is designed to take advantage of the symmetry of the matrix to reduce inter-process communications. SBC is implemented within the paradigm of task-based runtime systems using the dense linear algebra library Chameleon associated with the StarPU runtime system. Experiments were carried out on the experimental platform PlaFRIM using homogeneous CPU-only nodes. The factorization of several synthetic test case matrices demonstrate that using the SBC distribution actually reduces the total volume of inter-process communication by a factor of sqrt(2) compared to the standard 2DBC distribution, as predicted by the theoretical analysis. The results clearly show that using SBC allows better performance and scalability than with 2DBC distribution in all tested configurations.