Skip to yearly menu bar Skip to main content


Poster

Learning to Compress Long Contexts by Dropping-In Convolutions

Ruisi Cai · Yuandong Tian · Zhangyang “Atlas” Wang · Beidi Chen


Abstract:

This paper tackles the significant challenge of processing long context sequences in Large Language Models (LLMs), a task that poses substantial computational demands due to transformers’ quadratic memory requirements. We present a novel approach, Learning to Compress Long Context via Learning Convolutions (LC3), aimed at enhancing efficiency in both inference and fine-tuning stages, by employing a fixed-size Key-Value (KV) cache to manage memory usage effectively. Diverging from prior methods that selectively drop KV pairs based on set heuristics, LC3 leverages a data-driven adaptive fusion technique, blending previous KV pairs with incoming tokens to minimize the loss of contextual information and ensure accurate attention modeling. This is achieved through the use of one-dimensionalconvolutional kernels that dynamically calculate mixing weights for each KV cache slot, facilitating efficient token integration. LC3 is designed for broad compatibility with existing LLM frameworks, allowing for straightforward “drop-in” integration without the need for architectural modifications, while incurring minimal tuning overhead. Experiments demonstrate that LC3 maintains consistently outstanding performance across various context lengths and demonstrates a high context compression rate during both inference and fine-tuning phases. During inference, we successfully compressed up to 3482 tokens into a 128-size KV cache, showcasing comparable performance to the full sequence. This resulted in a performance improvement of up to 0.2791 compared to baseline methods in accuracy. Furthermore, we effectively extended the context length from 4K to 32K using a KV cache of size 512, demonstrating performance similar to fine-tuning on the entire sequence. Codes will be publicly released.

Live content is unavailable. Log in and register to view live content