FOSDEM 2026

"Expanding GGML Hardware Support using the Vulkan API" ( 2025 )

Sunday at 15:40, 20 minutes, UB2.252A (Lameere), UB2.252A (Lameere), Low-level AI Engineering and Hacking Ruben Ortlam , slides , video

Most machine learning applications are accelerated using vendor-specific APIs like CUDA and ROCm. While alternatives like OpenCL and SYCL exist, they are not as well-supported. What if we could harness the broad driver support that is being put into gaming and use Vulkan compute shaders instead? In this talk, I will present advantages and disadvantages of this approach and the difficulties I had to overcome to create a Vulkan API backend for llama.cpp.

2026

0.70 "Vulkan API for Machine Learning? Competing with CUDA and ROCm in llama.cpp"
0.52 "0 A.D.: Vulkan and its obstacles in open-source game"
0.52 "Single-source cross-platform GPU LLM inference with Slang and Rust"
0.51 "GPU Offloading in LLVM: Architecture, API, and Plugins"
0.50 "API Remoting for llama.cpp: Near-Native GPU Speed in macOS Containers"

2025

0.56 "Building a new GGML backend: How, Challenges and Opportunities with Novel Accelerators"
0.53 "GPUStack: Building a Simple and Scalable Management Experience for Diverse AI Models"
0.51 "The bare metal perspective on AMD's GPU ASICs"
0.50 "Accelerating AI with open source hardware and software"

2024

0.50 "ML Guided Optimizations in LLVM"

Related: