DEV Community

NexusQuant Series' Articles

Back to João André Gomes Marques's Series
Compress your LLM's KV cache 33x with zero training
Cover image for Compress your LLM's KV cache 33x with zero training

Compress your LLM's KV cache 33x with zero training

Comments
2 min read
Why E8 lattice quantization beats scalar quantization for KV caches
Cover image for Why E8 lattice quantization beats scalar quantization for KV caches

Why E8 lattice quantization beats scalar quantization for KV caches

Comments
2 min read
Longer contexts are easier to compress (not harder)
Cover image for Longer contexts are easier to compress (not harder)

Longer contexts are easier to compress (not harder)

Comments
2 min read
NexusQuant benchmarks: every number, honestly
Cover image for NexusQuant benchmarks: every number, honestly

NexusQuant benchmarks: every number, honestly

Comments
5 min read
NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison
Cover image for NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison

NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison

Comments
4 min read
How to deploy NexusQuant in production (and what's missing)
Cover image for How to deploy NexusQuant in production (and what's missing)

How to deploy NexusQuant in production (and what's missing)

Comments
4 min read