Go to main content

Preparing for the deployment of large scientific and engineering codes on upcoming exascale systems with GPU-dense nodes is made challenging by the unprecedented diversity of device architectures and heterogeneous programming models. In this work we evaluate the process of porting a massively parallel multi-physics code written in CUDA to SYCL HIP and Kokkos with a range of backends using a combination of automated tools and manual tuning. We use a proxy application alongside a custom performance model to inform results and identify additional optimization strategies. At scale performance of the programming model variants is evaluated on pre-production GPU node architectures for Frontier and Aurora as well as on current NVIDIA device-based systems Summit and Polaris. Real-world workloads representing 3D flow calculations in complex geometries of densely packed flows are assessed. Our analysis highlights critical trade-offs between code performance portability and development time.

Metric
From
To
Interval
Export
Download Full History