ai-gpu
VRAM (Video RAM)
Memory on a GPU used to store model weights, activations, and KV cache during LLM inference. VRAM is the primary constraint when running large language models locally.
Memory on a GPU used to store model weights, activations, and KV cache during LLM inference. VRAM is the primary constraint when running large language models locally.