Skip to content

This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

License

Notifications You must be signed in to change notification settings

machilusZ/FastGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

FastGen

This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

For now, please check the community reproduction here: https://github.com/AnswerDotAI/cold-compress/tree/main The repo is develope by answer.ai and contains a reproduction of FastGen algorithm, and a wide range of other compression strategies.

Meanwhile, Microsoft Research Asia releases MInference https://github.com/microsoft/MInference/tree/main, which is a close implementation of the FastGen algorithm. The core part of the algorithm, for each attention head and each data instance, at the end of prefilling stage, profile the attention matrix, and pick a hybrid portfolio of eviction strategies(special tokens, local, topk...) based on the profiling.

About

This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published