"Optimizing Resource Utilization for Interactive GPU Workloads with Transparent Container Checkpointing" ( 2025 )

Sunday at 09:00, 25 minutes, UB5.132, UB5.132, HPC, Big Data & Data Science Adrian Reber Radostin Stoyanov Viktória Spišaková , slides , video

Interactive GPU workloads, such as Jupyter notebooks and generative AI inference are becoming increasingly popular in scientific research and data analysis. However, efficiently allocating expensive GPU resources in multi-tenant environments like Kubernetes clusters is challenging due to the unpredictable usage patterns of these workloads. Container checkpointing was recently introduced as a beta feature in Kubernetes and has been extended to support GPU-accelerated applications. In this talk, we present a novel approach to optimizing resource utilization for interactive GPU workloads using container checkpointing. This approach enables dynamic reallocation of GPU resources based on real-time workload demands, without the need for modifying existing applications. We demonstrate the effectiveness of our approach through experimental evaluations with a variety of interactive GPU workloads and present preliminary results that highlight its potential.