FOSDEM 2026

"wllama: bringing llama.cpp to the web" ( 2025 )

Sunday at 16:20, 20 minutes, UB2.252A (Lameere), UB2.252A (Lameere), Low-level AI Engineering and Hacking Xuan-Son Nguyen , video

As one of the main contributor of the llama.cpp project, I’ve explored ways to bring its capabilities to the web through WebAssembly, creating a frontend solution for on-device inference without the need for servers or external APIs. This talk shares my journey in implementing wllama, a lightweight TypeScript/JavaScript library designed to push llama.cpp’s limits in a web context. I’ll discuss my motivations, the implementation details, the challenges faced, and the future roadmap, offering insights into the technical and creative decisions behind the project.

2026

0.52 "Multimodal support in llama.cpp - Achievements and Future Directions"
0.47 "WebAssembly Debugging with LLDB"

2025

0.50 "The Local AI Rebellion"
0.50 "How Llamagator helps to implement LLM-as-a-Judge concept on your local machine"
0.50 "Building a new GGML backend: How, Challenges and Opportunities with Novel Accelerators"
0.48 "WebAssembly for Gophers: from Wasm to Asm and back!"
0.48 "RamaLama: Making working with AI Models Boring"
0.47 "Building Your (Local) LLM Second Brain"
0.47 "History and advances of quantization in llama.cpp"

2024

0.46 "Deploy Your Next Python App with WebAssembly (Wasm): Smaller, Safer, Faster"

Related: