ai-gpu
GPU Hosting Cost Calculator
Compare GPU cloud rental costs across RunPod, Lambda Labs, and Vast.ai. Calculate monthly spend for LLM inference, fine-tuning, and ML training workloads.
Choosing a GPU Cloud for LLM Inference
GPU cloud pricing varies significantly between providers and GPU models. Understanding these differences can save 40–60% on inference costs.
Provider Overview
| Provider | Strength | Weakness |
|---|---|---|
| RunPod | Always available, large marketplace | Slightly higher prices |
| Lambda Labs | Cheapest A100/H100 on-demand | Limited availability |
| Vast.ai | Cheapest spot prices | Variable reliability |
On-Demand vs Reserved vs Spot
- ›On-demand: Full price, always available — best for inference serving
- ›Reserved (Lambda): 1-month commitment, 20–40% discount — best for 24/7 workloads
- ›Spot/Interruptible: Lowest price, can be interrupted — best for training with checkpoints
Right-Sizing for LLM Inference
Match VRAM to your model + quantization:
- ›7B FP16 → 14 GB → RTX 4090 (24 GB)
- ›13B INT4 → 6.5 GB → RTX 4090 (24 GB)
- ›70B INT4 → 35 GB → A100 40 GB
- ›70B FP16 → 140 GB → 2× A100 80 GB
Cost Optimization Tips
- ›Use INT4/GGUF quantization for inference — minimal accuracy loss, 4× VRAM reduction
- ›Batch requests to keep GPU utilization >70%
- ›Use spot instances for batch inference jobs with a queue + retry logic
Frequently Asked Questions
Which GPU cloud is cheapest for running Llama 3 70B?
For Llama 3 70B at INT4, an A100 40 GB is sufficient. Lambda Labs typically offers the lowest on-demand A100 rate (~$1.29/hr) when available. Vast.ai spot instances can be 30–50% cheaper but are interruptible.
Is RunPod or Lambda Labs better for LLM inference?
Lambda Labs is often cheaper for A100s but has limited availability. RunPod has a larger GPU marketplace and on-demand availability. For production inference, Lambda Labs Reserved Instances offer the best $/hr. For dev and experimentation, RunPod is more flexible.
What is the difference between on-demand and spot GPU instances?
On-demand instances are always available and never interrupted. Spot instances (called 'interruptible' on RunPod, 'spot' on Vast.ai) can be 30–60% cheaper but may be reclaimed by the provider with short notice. Use spot for training jobs with checkpointing, on-demand for inference serving.
How much does it cost to run a 70B LLM 24/7 on A100?
An A100 80 GB running 24/7 at RunPod costs ~$1,793/month ($2.49/hr). Two A100 40 GB instances for 70B INT4 would cost ~$1,361/month at Lambda Labs pricing. A dedicated bare-metal A100 server is cheaper at scale — typically $800–1,200/month.