ai-gpu

VRAM (Video RAM)

Memory on a GPU used to store model weights, activations, and KV cache during LLM inference. VRAM is the primary constraint when running large language models locally.

Related Terms

Quantization