🌱 Open Source
Back to Open Source

🐙 GitHub Detail

K

dipampaul17/KVSplit

By dipampaul17

Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.

GitHub Python Other Updated 07 Jun 2026

Live Snapshot

Stars

361

🍴

Forks

13

📄

License

Other

🧩

Type

Python

📘

About this open-source project

Live information fetched from GitHub.

Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.

🌿

Default Branch

main

🐞

Open Issues

0

👀

Watchers

361