FOSDEM 2026

"Self-hosted LLMs at a scale with Paddler" ( 2025 )

Sunday at 12:40, 20 minutes, UB2.252A (Lameere), UB2.252A (Lameere), Low-level AI Engineering and Hacking Mateusz Charytoniuk , slides , video

Paddler is an open-source llama.cpp load balancer designed to address unique challenges that Large Language Models pose.

Typical balancing algorithms like round-robin or least-connections are not the most efficient approaches.

To introduce predictability into your infrastructure, Paddler reaches for alternative solutions that account for unpredictable response times while being able to scale services up and down at any moment.

This talk will demonstrate Paddler's general design concepts (the "why") and some primary use cases (the "how").

2026

0.54 "From Infrastructure to Production: A Year of Self-Hosted LLMs"
0.45 "Supercharging LLM serving with Dynamo"
0.45 "Taming the LLM Zoo with Docker Model Runner: Inference with OCI Artifacts, llama.cpp, and vLLM"

2025

0.48 "How Llamagator helps to implement LLM-as-a-Judge concept on your local machine"
0.47 "Building Your (Local) LLM Second Brain"
0.46 "The Local AI Rebellion"
0.46 "wllama: bringing llama.cpp to the web"
0.45 "RamaLama: Making working with AI Models Boring"
0.44 "Building a new GGML backend: How, Challenges and Opportunities with Novel Accelerators"

2024

0.44 "From OpenLLM-France to OpenLLM-Europe: Paving the way to sovereign and open source AI"

Related: