Katz: Efficient Workflow Serving for Diffusion Models with Many Adapters

Suyi Li; Lingyun Yang; Xiaoxiao Jiang; Hanfeng Lu; Dakai An; Zhipeng Di; Weiyi Lu; Jiawei Chen; Kan Liu; Yinghao Yu; Tao Lan; Guodong Yang; Lin Qu; Liping Zhang; Wei Wang

Authors:

Suyi Li, Lingyun Yang, Xiaoxiao Jiang, Hanfeng Lu, and Dakai An, Hong Kong University of Science and Technology; Zhipeng Di, Weiyi Lu, Jiawei Chen, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin Qu, and Liping Zhang, Alibaba Group; Wei Wang, Hong Kong University of Science and Technology

Abstract:

Text-to-image (T2I) generation using diffusion models has become a blockbuster service in today's AI cloud. A production T2I service typically involves a serving workflow where a base diffusion model is augmented with many ControlNet and LoRA adapters to control the details of output images, such as shapes, outlines, poses, and styles. In this paper, we present Katz, a system that efficiently serves a T2I workflow with many adapters. Katz differentiates compute-heavy ControlNets from compute-light LoRAs, where the former introduces significant computational overheads while the latter is bottlenecked by loading. Katz proposes to take ControlNet off the critical path with a ControlNet-as-a-Service design, in which ControlNets are decoupled from the base model and deployed as a separate, independently scalable service on dedicated GPUs, thus enabling ControlNet caching, parallelization, and sharing. To hide the high LoRA loading overhead, Katz employs bounded asynchronous loading that overlaps LoRA loading with initial base model execution by a maximum of K steps, while maintaining the same image quality. Katz further accelerates base model execution across multiple GPUs with latent parallelism. Collectively, these designs enable Katz to outperform the state-of-the-art T2I serving systems, achieving up to 7.8× latency reduction and 1.7× throughput improvement in serving SDXL models on H800 GPUs, without compromising image quality.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Li-Suyi-Katz PDF