"Expanding GGML Hardware Support using the Vulkan API" ( 2025 )

Sunday at 15:40, 20 minutes, UB2.252A (Lameere), UB2.252A (Lameere), Low-level AI Engineering and Hacking Ruben Ortlam , slides , video

Most machine learning applications are accelerated using vendor-specific APIs like CUDA and ROCm. While alternatives like OpenCL and SYCL exist, they are not as well-supported. What if we could harness the broad driver support that is being put into gaming and use Vulkan compute shaders instead? In this talk, I will present advantages and disadvantages of this approach and the difficulties I had to overcome to create a Vulkan API backend for llama.cpp.