"wllama: bringing llama.cpp to the web" ( 2025 )

Sunday at 16:20, 20 minutes, UB2.252A (Lameere), UB2.252A (Lameere), Low-level AI Engineering and Hacking Xuan-Son Nguyen , video

As one of the main contributor of the llama.cpp project, I’ve explored ways to bring its capabilities to the web through WebAssembly, creating a frontend solution for on-device inference without the need for servers or external APIs. This talk shares my journey in implementing wllama, a lightweight TypeScript/JavaScript library designed to push llama.cpp’s limits in a web context. I’ll discuss my motivations, the implementation details, the challenges faced, and the future roadmap, offering insights into the technical and creative decisions behind the project.