🐙 GitHub Detail
dipampaul17/KVSplit
By dipampaul17
Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
Live Snapshot
⭐
Stars
361
🍴
Forks
13
📄
License
Other
🧩
Type
Python
About this open-source project
Live information fetched from GitHub.
Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
Default Branch
main
Open Issues
0
Watchers
361