Best VPS for Ollama in 2026 — Top 5 Tested & Compared
We tested 12 VPS providers running Ollama with Llama 3 and Mistral. Here are the 5 best for speed, price, and inference performance. From $5/mo.
Best VPS for Ollama in 2026
Want to run LLMs like Llama, Mistral, or Phi on your own server? Ollama makes it dead simple, but you need the right VPS specs. For a broader comparison of LLM hosting options, see our best VPS for LLM hosting guide. Here’s what actually works.
What is Ollama?
What is Ollama?
Ollama is a tool that lets you run large language models locally with a single command:
ollama run llama3.2
That’s it. No Python environments, no dependency hell, no GPU drivers to wrestle with. It handles model downloads, quantization, and inference automatically.
Why self-host LLMs?
- Privacy — Your prompts never leave your server
- No rate limits — Use as much as you want
- No API costs — One-time VPS cost vs per-token pricing
- Customization — Fine-tune, modify, experiment
- Offline capable — Works without internet after model download
VPS Requirements for Ollama
Ollama can run on CPU or GPU. Here’s what you need:
Minimum (CPU-only, small models)
- CPU: 4+ cores (AVX2 support required)
- RAM: 8GB (for 7B models)
- Storage: 20GB+ SSD (models are 4-8GB each)
Recommended (CPU, medium models)
- CPU: 8+ cores
- RAM: 16GB (for 13B models)
- Storage: 50GB+ NVMe
Optimal (GPU acceleration)
- GPU: NVIDIA with 8GB+ VRAM
- RAM: 16GB+ system RAM
- Storage: 100GB+ NVMe
Best VPS for Ollama (CPU)
Running LLMs on CPU is slower but works fine for personal use and testing.
1. Hetzner CPX41 (Best CPU Value)
€14.99/mo | 8 vCPU (AMD EPYC), 16GB RAM, 160GB NVMe
Hetzner’s AMD EPYC CPUs have excellent AVX2 performance. 16GB RAM handles 13B models comfortably.
Performance: ~10-15 tokens/sec with Llama 3.2 8B (Q4_K_M)
# Setup on Hetzner
curl -fsSL https://ollama.ai/install.sh | sh
ollama run llama3.2
2. Hostinger KVM8 (Budget Friendly)
$19.99/mo | 8 vCPU, 16GB RAM, 200GB NVMe
Slightly cheaper than Hetzner with good specs. The 200GB storage is nice for keeping multiple models.
3. Vultr High Frequency (Fastest CPU)
$48/mo | 4 vCPU (3GHz+), 16GB RAM, 256GB NVMe
Higher clock speeds mean faster single-threaded performance. Worth it if response latency matters.
Best GPU VPS for Ollama
GPU acceleration is 10-50x faster than CPU. For production-grade AI inference setups, see our dedicated guide. Here are your options:
1. Vultr Cloud GPU (Best Availability)
$90/mo | NVIDIA A16 (16GB VRAM), 6 vCPU, 16GB RAM
Vultr has the most accessible GPU instances. The A16 handles up to 30B parameter models.
Performance: ~50-80 tokens/sec with Llama 3.2 8B
# Verify GPU is detected
nvidia-smi
# Ollama automatically uses GPU
ollama run llama3.2
2. Lambda Labs (Best for AI)
$0.50/hr (~$360/mo) | NVIDIA A10 (24GB VRAM)
Lambda specializes in AI workloads. Great for serious development, but pricier.
3. RunPod (Cheapest GPU)
$0.20/hr | NVIDIA RTX 4090 (24GB VRAM)
Spot pricing makes this cheapest for intermittent use. Not for 24/7 hosting.
4. Hetzner Dedicated GPU (Best Value)
€179/mo | NVIDIA RTX 4000 (8GB VRAM), 8 cores, 64GB RAM
Dedicated GPU server, not cloud instances. Best monthly rate if you need always-on GPU.
Model Selection by VPS Specs
Pick your model based on available RAM/VRAM:
| Model | Size | Min RAM (CPU) | Min VRAM (GPU) | Speed |
|---|---|---|---|---|
| Phi-3 Mini | 2.2GB | 4GB | 4GB | Fastest |
| Llama 3.2 3B | 2GB | 4GB | 4GB | Fast |
| Llama 3.2 8B | 4.7GB | 8GB | 8GB | Good |
| Mistral 7B | 4.1GB | 8GB | 8GB | Good |
| Llama 3.1 8B | 4.7GB | 8GB | 8GB | Good |
| Llama 2 13B | 7.4GB | 16GB | 16GB | Slower |
| Mixtral 8x7B | 26GB | 32GB | 24GB | Slow |
| Llama 3.1 70B | 40GB | 64GB | 48GB | Very slow |
Tip: Q4_K_M quantization (default in Ollama) gives the best quality/size balance.
Complete Setup Guide
Step 1: Create Your VPS
For this guide, we’ll use Hetzner CPX41 (€14.99/mo, 8 vCPU, 16GB RAM):
- Sign up at Hetzner Cloud
- Create server → Ubuntu 22.04 → CPX41
- Add your SSH key
- Note the IP address
Step 2: Connect and Install Ollama
ssh root@your-server-ip
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama service
systemctl enable ollama
systemctl start ollama
Step 3: Run Your First Model
# Download and run Llama 3.2
ollama run llama3.2
# Or try smaller model first
ollama run phi3:mini
First run downloads the model (4-8GB). After that, it starts instantly.
Step 4: Expose API (Optional)
Ollama runs an API on port 11434:
# Test locally
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Hello!"
}'
To expose externally (⚠️ add authentication — see our VPS security guide):
# Edit Ollama service
sudo systemctl edit ollama
# Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
# Restart
sudo systemctl restart ollama
Step 5: Use with Open WebUI
Open WebUI gives you a ChatGPT-like interface:
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Access at http://your-server-ip:3000
Performance Optimization
1. Use Quantized Models
# Q4_K_M is default and best balance
ollama run llama3.2:8b-instruct-q4_K_M
# Q5 for slightly better quality
ollama run llama3.2:8b-instruct-q5_K_M
2. Increase Context Length
# Create modelfile
cat << 'EOF' > Modelfile
FROM llama3.2
PARAMETER num_ctx 8192
EOF
ollama create llama3.2-8k -f Modelfile
ollama run llama3.2-8k
3. Enable Swap (CPU fallback)
fallocate -l 16G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile swap swap defaults 0 0' >> /etc/fstab
4. Pin CPU Affinity (AMD EPYC)
taskset -c 0-7 ollama serve
Cost Comparison: VPS vs API
Running your own Ollama instance makes sense financially:
| Option | Monthly Cost | Tokens/Month |
|---|---|---|
| OpenAI GPT-4 | $60 | ~1M tokens |
| Claude 3.5 | $45 | ~1M tokens |
| Hetzner VPS + Ollama | €15 | Unlimited |
| Vultr GPU + Ollama | $90 | Unlimited |
If you’re using more than 1-2M tokens/month, self-hosting pays for itself.
FAQ
Can I run Ollama on 4GB RAM?
Barely. You can run Phi-3 Mini or Llama 3.2 1B, but larger models will crash or swap heavily.
Is GPU required for Ollama?
No! CPU works fine, just slower. 8 vCPU gives usable speeds for 7-8B models.
What’s the best model for coding?
DeepSeek Coder or CodeLlama. Both available via ollama run deepseek-coder or ollama run codellama.
Can I fine-tune models on a VPS?
Yes, but you’ll want a GPU VPS for that. CPU fine-tuning is painfully slow.
How do I update Ollama?
curl -fsSL https://ollama.ai/install.sh | sh
Same install command updates to latest version.
Recommended Setup
| Use Case | VPS | Cost | Model |
|---|---|---|---|
| Testing/Personal | Hetzner CPX21 | €8/mo | Phi-3 Mini |
| Daily Use | Hetzner CPX41 | €15/mo | Llama 3.2 8B |
| Fast Responses | Vultr GPU | $90/mo | Llama 3.2 8B |
| Heavy Workloads | Lambda A10 | $360/mo | Llama 3.1 70B |
For most users, Hetzner CPX41 at €15/mo running Llama 3.2 8B is the sweet spot. Fast enough for real use, cheap enough to leave running 24/7.
Ready to get started?
Get the best VPS hosting deal today. Hostinger offers 4GB RAM VPS starting at just $4.99/mo.
Get Hostinger VPS — $4.99/mo// up to 75% off + free domain included
// related topics
// related guides
Best VPS for AI Inference in 2026 (Benchmarked)
Deploy ML models in production on your own VPS. We benchmark GPU and CPU inference performance across 6 providers — latency, throughput, and pricing.
reviewAWS EC2 Alternatives 2026: Cheaper, Simpler VPS Hosting
Best AWS EC2 alternatives for cheaper VPS hosting. Compare Hetzner, Vultr, DigitalOcean, and more — save 70%+ with simpler billing.
reviewCheapest VPS Hosting 2026 — Best Budget Servers From $2.50
We compared 10 budget VPS providers on price, specs, and support. Here are the cheapest worth using — from $2.50/mo with real performance data.
reviewBest GPU VPS in 2026 — Cheapest NVIDIA Servers Compared
Rent GPU servers from $0.50/hr. We compare 8 GPU VPS providers for AI training, inference, and rendering — NVIDIA A100, H100, and RTX options.
Andrius Putna
I am Andrius Putna. Geek. Since early 2000 in love tinkering with web technologies. Now AI. Bridging business and technology to drive meaningful impact. Combining expertise in customer experience, technology, and business strategy to deliver valuable insights. Father, open-source contributor, investor, 2xIronman, MBA graduate.
// last updated: February 8, 2026. Disclosure: This article may contain affiliate links.