{KVCache} Cache in the Wild: Characterizing and Optimizing {KVCache} Cache at a Large Cloud Provider

Jiahao Wang; Jinbo Han; Xingda Wei; Sijie Shen; Dingyan Zhang; Chenguang Fang; Rong Chen; Wenyuan Yu; Haibo Chen

Authors:

Jiahao Wang, Jinbo Han, and Xingda Wei, Shanghai Jiao Tong University; Sijie Shen, Alibaba Group; Dingyan Zhang, Shanghai Jiao Tong University; Chenguang Fang, Alibaba Group; Rong Chen, Shanghai Jiao Tong University; Wenyuan Yu, Alibaba Group; Haibo Chen, Shanghai Jiao Tong University

Abstract:

Serving large language models (LLMs) is important for cloud providers, and caching intermediate results (KV$) after processing each request substantially improves serving throughput and latency. However, there is limited understanding of how LLM serving benefits from KV$ caching, where system design decisions like cache eviction policies are highly workload-dependent.

In this paper, we present the first systematic characterization of the KV$ workload patterns from one of the leading LLM service providers. We draw observations that were not covered by previous studies focusing on synthetic workloads, including: KV$ reuses are skewed across requests, where reuses between single-turn requests are equally important as multi-turn requests; the reuse time and probability are diverse considering all requests, but for a specific request category, the pattern tends to be predictable; and the overall cache size required for an ideal cache hit ratio is moderate. Based on the characterization, we further propose a workload-aware cache eviction policy that improves the serving performance under real-world traces, especially with limited cache capacity.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Wang-Jiahao PDF

KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider

Open Access Media