
NVFP4 KV cache, part 2: SGLang
AI ๊ธฐ๋ฅ
- ์กฐํ
- 1.1M
- ์ข์์
- 239
- ๋ฆฌํฌ์คํธ
- 10
- ๋๊ธ
- 7
- ๋ถ๋งํฌ
- 32
TL;DR
This technical deep dive explains the integration of native 4-bit NVFP4 KV cache into SGLang, overcoming RadixAttention and head-dimension challenges for the Gemma 4 model family on Blackwell hardware.
ํ๊ตญ์ด ๋ฒ์ญ์ ๋ณด๋ ์ค





