Repository for A Simple and Effective L2 Norm-Based Method for KV Cache Compression, presented at EMNLP 2024. TL;DR Tokens with low $L_2$ norm in their key embeddings ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results