K8sCalc

ai-gpu

DeepSeek R1 VRAM Calculator

Calculate VRAM for DeepSeek R1 (671B MoE) and its distilled variants (1.5B–70B). The full R1 requires massive multi-GPU setups; distilled versions run on consumer hardware.

DeepSeek R1: Full Model vs Distilled Variants

DeepSeek R1 is a reasoning-first LLM from DeepSeek AI, trained using reinforcement learning to excel at math, code, and logical reasoning. The full 671B MoE model rivals GPT-4o, but the distilled variants are what most engineers will actually run.

VRAM by Model Size at INT4

ModelParamsVRAM (INT4)Minimum GPU
R1 full671B MoE~335 GB8× A100 80GB
R1-Distill-70B70B dense~38 GBA100 40GB
R1-Distill-32B32B dense~18 GBRTX 4090 24GB (tight)
R1-Distill-14B14B dense~8 GBRTX 3070 8GB
R1-Distill-7B7B dense~4.5 GBAny 6GB+ GPU
R1-Distill-1.5B1.5B dense~1 GBCPU-only feasible

Why the Distilled Models Are Remarkable

The distilled variants inherit R1's chain-of-thought reasoning style through knowledge distillation. R1-Distill-7B beats GPT-4 on several reasoning benchmarks — running on a consumer RTX 3070.

MoE Memory Note

The full R1 671B is MoE — only ~37B parameters are active per token. But ALL 671B parameters must be in VRAM. INT4 brings the memory footprint from 1.3 TB (FP16) to ~335 GB, which still requires serious multi-GPU hardware.

Frequently Asked Questions

How much VRAM does DeepSeek R1 671B need?

The full DeepSeek R1 is a 671B Mixture of Experts model. At INT4, it needs ~335 GB VRAM — requiring 8× A100 80GB or 5× H100 80GB. In practice, most users run the distilled variants (7B–70B) which offer strong reasoning on consumer hardware.

What are the DeepSeek R1 distilled models?

DeepSeek released smaller distilled versions trained from R1: R1-Distill-Qwen-1.5B, 7B, 14B, 32B and R1-Distill-Llama-8B, 70B. The 7B distill runs at GGUF Q4 on any 8GB GPU. The 70B distill runs at INT4 on an A100 40GB, with reasoning quality close to the full 671B model.

Is DeepSeek R1 better than GPT-4 for reasoning?

DeepSeek R1 matches or exceeds GPT-4o on AIME 2024, Codeforces, and MATH benchmarks — at a fraction of the training cost. For open-source local deployment, R1-Distill-70B-INT4 is the strongest reasoning model available below $2/hr cloud cost.

How do I run DeepSeek R1 locally?

Use Ollama: `ollama run deepseek-r1:7b` (for the 7B distill, ~4.5GB VRAM) or `ollama run deepseek-r1:70b` (for 70B distill at Q4, ~40GB VRAM). For the full 671B model you need a multi-GPU cluster — use vLLM with tensor parallelism.

Related Tools

Related Guides