"One GPU, Many Models: What Works and What Segfaults" ( 2026 )

Saturday at 13:55, 20 minutes, UD2.120 (Chavanne), UD2.120 (Chavanne), AI Plumbers YASH PANCHAL , slides , video

Serving multiple models on a single GPU sounds great until something segfaults.

Two approaches dominate for parallel inference: MIG (hardware partitioning) and MPS (software sharing). Both promise efficient GPU sharing.

I tested both strategies for video generation workloads in parallel.

This talk digs into what actually happened: where things worked, where memory isolation fell apart, which configs crashed, and what survives under load.

By the end, you'll know:

  1. How to utilize unused GPU capacity.
  2. How to setup MIG and MPS.
  3. Memory issues, crashes, and failures.
  4. Workload specific configs