Minimize Vast.ai GPU cloud costs by choosing the right GPU for your workload, leveraging interruptible (spot) instances, and eliminating idle compute time. Vast.ai is a GPU marketplace with highly variable pricing: RTX 4090 ($0.15-0.30/hr), A100 80GB ($1.00-2.00/hr), H100 (~$2.50-4.00/hr).
vastai CLI installed# GPU selection by workload type
inference_7b_model:
recommended: RTX 3090 (24GB VRAM) # 3090 = configured value
cost: "$0.10-0.20/hr"
why: "Cheapest GPU with enough VRAM for 7B models"
inference_70b_model:
recommended: A100 40GB or 2x RTX 3090
cost: "$0.80-1.50/hr"
why: "Need 40GB+ VRAM for quantized 70B models"
training_small:
recommended: RTX 4090 (24GB VRAM) # 4090 = configured value
cost: "$0.15-0.30/hr"
why: "Best price/performance for fine-tuning up to 13B"
training_large:
recommended: A100 80GB
cost: "$1.00-2.00/hr"
why: "Need 80GB VRAM for full-precision large model training"
# Interruptible (spot) instances are 30-60% cheaper
# Search for cheapest interruptible A100
vastai search offers 'gpu_name=A100 num_gpus=1 reliability>0.9 interruptible=true' \
--order 'dph_total' --limit 5
# Create interruptible instance (must implement checkpointing!)
vastai create instance OFFER_ID --interruptible \
--image pytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel \
--onstart-cmd "cd /workspace && python train.py --resume-from-checkpoint"
# Cron job every 15 minutes: kill instances idle >1 hour
#!/bin/bash
vastai show instances --raw | \
jq -r '.[] | select(.gpu_utilization < 5 and ((.cur_state_time - .start_time) > 3600)) | .id' | \ # 3600: timeout: 1 hour
while read id; do
echo "Destroying idle instance $id (GPU util <5% for >1hr)"
vastai destroy instance "$id"
done
# Set maximum runtime to prevent runaway costs
import subprocess, time
MAX_HOURS = 8 # Budget: 8 hours max
INSTANCE_ID = "12345" # port 12345 - example/test
start_time = time.time()
while True:
elapsed_hours = (time.time() - start_time) / 3600 # 3600: timeout: 1 hour
if elapsed_hours > MAX_HOURS:
print(f"Time limit reached ({MAX_HOURS}h). Saving checkpoint and terminating.")
subprocess.run(["vastai", "destroy", "instance", INSTANCE_ID])
break
time.sleep(300) # 300: Check every 5 minutes
# Always compare offers before creating an instance
vastai search offers 'gpu_name=RTX_4090 num_gpus=1 reliability>0.95 inet_down>200' \ # HTTP 200 OK
--order 'dph_total' --limit 10 | \
head -5
# Price varies 2-3x for same GPU depending on host, region, and demand
# Calculate total cost before starting
echo "Job estimate: 4x A100 for 12 hours"
echo "Cheapest offer: \$(vastai search offers 'gpu_name=A100 num_gpus=4' --order 'dph_total' --limit 1 | awk 'NR==2{print $6}')/hr"
| Issue | Cause | Solution |
|---|---|---|
| Instance preempted mid-training | Using interruptible without checkpointing | Implement checkpoint saving every 30 minutes |
| Overpaying for GPU | Not comparing offers | Always search and sort by price before provisioning |
| Idle GPU burning money | Job finished but instance still running | Add auto-terminate script to training pipeline |
| Insufficient VRAM | Wrong GPU selected | Check model VRAM requirements before provisioning |
Basic usage: Apply vastai cost tuning to a standard project setup with default configuration options.
Advanced scenario: Customize vastai cost tuning for production environments with multiple constraints and team-specific requirements.