ai-gpu

Quantization

A technique to reduce model memory usage by representing weights in lower precision (INT8, INT4, GGUF-Q4). Quantization trades a small accuracy loss for significant VRAM reduction.

Related Terms

VRAM (Video RAM)