Speaker
Description
This case study summarizes the evolution of an existing CUDA code base in DBCSR and CP2K like the introduction of HIP as well as an OpenCL based implementation. The offload interface in CP2K for instance is already an evolution of DBCSR's offload interface with both focusing on commonly supported primitives accross stream programming models, i.e., only a handful of easy-to-interface C functions.
With the introduction of HIP, the idea extended to limiting kernel language to a well supported (sub-)set of primitives. For the OpenCL side, the work raised the limits of the existing implementation in DBCSR such as improved auto-tuning infrastructure, generalized kernels and tuning parameters, supporting all vendors out of the box, and reusing the OpenCL backend with CP2K Molecular Dynamics application.
The talk closes with a collection of results achieved on current HPC installations (DBCSR-MM and CP2K-DBM distributed block-sparse matrix multiplication).