Best VPS for Ollama 2026: Run LLMs on Your Own Server
Find the best VPS for running Ollama and self-hosted LLMs. Compare GPU VPS options, CPU requirements, and get your AI models running in minutes.
Best VPS for Ollama in 2026
Want to run LLMs like Llama, Mistral, or Phi on your own server? Ollama makes it dead simple, but you need the right VPS specs. Here's what actually works.
What is Ollama?
Ollama is a tool that lets you run large language models locally with a single command:
ollama run llama3.2
That's it. No Python environments, no dependency hell, no GPU drivers to wrestle with. It handles model downloads, quantization, and inference automatically.
Why self-host LLMs?
- Privacy — Your prompts never leave your server
- No rate limits — Use as much as you want
- No API costs — One-time VPS cost vs per-token pricing
- Customization — Fine-tune, modify, experiment
- Offline capable — Works without internet after model download
VPS Requirements for Ollama
Ollama can run on CPU or GPU. Here's what you need:
Minimum (CPU-only, small models)
- CPU: 4+ cores (AVX2 support required)
- RAM: 8GB (for 7B models)
- Storage: 20GB+ SSD (models are 4-8GB each)
Recommended (CPU, medium models)
- CPU: 8+ cores
- RAM: 16GB (for 13B models)
- Storage: 50GB+ NVMe
Optimal (GPU acceleration)
- GPU: NVIDIA with 8GB+ VRAM
- RAM: 16GB+ system RAM
- Storage: 100GB+ NVMe
Best VPS for Ollama (CPU)
Running LLMs on CPU is slower but works fine for personal use and testing.
1. Hetzner CPX41 (Best CPU Value)
€14.99/mo | 8 vCPU (AMD EPYC), 16GB RAM, 160GB NVMe
Hetzner's AMD EPYC CPUs have excellent AVX2 performance. 16GB RAM handles 13B models comfortably.
Performance: ~10-15 tokens/sec with Llama 3.2 8B (Q4_K_M)
# Setup on Hetzner
curl -fsSL https://ollama.ai/install.sh | sh
ollama run llama3.2
2. Hostinger KVM8 (Budget Friendly)
$19.99/mo | 8 vCPU, 16GB RAM, 200GB NVMe
Slightly cheaper than Hetzner with good specs. The 200GB storage is nice for keeping multiple models.
3. Vultr High Frequency (Fastest CPU)
$48/mo | 4 vCPU (3GHz+), 16GB RAM, 256GB NVMe
Higher clock speeds mean faster single-threaded performance. Worth it if response latency matters.
Best GPU VPS for Ollama
GPU acceleration is 10-50x faster than CPU. Here are your options:
1. Vultr Cloud GPU (Best Availability)
$90/mo | NVIDIA A16 (16GB VRAM), 6 vCPU, 16GB RAM
Vultr has the most accessible GPU instances. The A16 handles up to 30B parameter models.
Performance: ~50-80 tokens/sec with Llama 3.2 8B
# Verify GPU is detected
nvidia-smi
# Ollama automatically uses GPU
ollama run llama3.2
2. Lambda Labs (Best for AI)
$0.50/hr (~$360/mo) | NVIDIA A10 (24GB VRAM)
Lambda specializes in AI workloads. Great for serious development, but pricier.
3. RunPod (Cheapest GPU)
$0.20/hr | NVIDIA RTX 4090 (24GB VRAM)
Spot pricing makes this cheapest for intermittent use. Not for 24/7 hosting.
4. Hetzner Dedicated GPU (Best Value)
€179/mo | NVIDIA RTX 4000 (8GB VRAM), 8 cores, 64GB RAM
Dedicated GPU server, not cloud instances. Best monthly rate if you need always-on GPU.
Model Selection by VPS Specs
Pick your model based on available RAM/VRAM:
| Model | Size | Min RAM (CPU) | Min VRAM (GPU) | Speed |
|---|---|---|---|---|
| Phi-3 Mini | 2.2GB | 4GB | 4GB | Fastest |
| Llama 3.2 3B | 2GB | 4GB | 4GB | Fast |
| Llama 3.2 8B | 4.7GB | 8GB | 8GB | Good |
| Mistral 7B | 4.1GB | 8GB | 8GB | Good |
| Llama 3.1 8B | 4.7GB | 8GB | 8GB | Good |
| Llama 2 13B | 7.4GB | 16GB | 16GB | Slower |
| Mixtral 8x7B | 26GB | 32GB | 24GB | Slow |
| Llama 3.1 70B | 40GB | 64GB | 48GB | Very slow |
Tip: Q4_K_M quantization (default in Ollama) gives the best quality/size balance.
Complete Setup Guide
Step 1: Create Your VPS
For this guide, we'll use Hetzner CPX41 (€14.99/mo, 8 vCPU, 16GB RAM):
- Sign up at Hetzner Cloud
- Create server → Ubuntu 22.04 → CPX41
- Add your SSH key
- Note the IP address
Step 2: Connect and Install Ollama
ssh root@your-server-ip
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama service
systemctl enable ollama
systemctl start ollama
Step 3: Run Your First Model
# Download and run Llama 3.2
ollama run llama3.2
# Or try smaller model first
ollama run phi3:mini
First run downloads the model (4-8GB). After that, it starts instantly.
Step 4: Expose API (Optional)
Ollama runs an API on port 11434:
# Test locally
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Hello!"
}'
To expose externally (⚠️ add authentication):
# Edit Ollama service
sudo systemctl edit ollama
# Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
# Restart
sudo systemctl restart ollama
Step 5: Use with Open WebUI
Open WebUI gives you a ChatGPT-like interface:
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Access at http://your-server-ip:3000
Performance Optimization
1. Use Quantized Models
# Q4_K_M is default and best balance
ollama run llama3.2:8b-instruct-q4_K_M
# Q5 for slightly better quality
ollama run llama3.2:8b-instruct-q5_K_M
2. Increase Context Length
# Create modelfile
cat << 'EOF' > Modelfile
FROM llama3.2
PARAMETER num_ctx 8192
EOF
ollama create llama3.2-8k -f Modelfile
ollama run llama3.2-8k
3. Enable Swap (CPU fallback)
fallocate -l 16G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile swap swap defaults 0 0' >> /etc/fstab
4. Pin CPU Affinity (AMD EPYC)
taskset -c 0-7 ollama serve
Cost Comparison: VPS vs API
Running your own Ollama instance makes sense financially:
| Option | Monthly Cost | Tokens/Month |
|---|---|---|
| OpenAI GPT-4 | $60 | ~1M tokens |
| Claude 3.5 | $45 | ~1M tokens |
| Hetzner VPS + Ollama | €15 | Unlimited |
| Vultr GPU + Ollama | $90 | Unlimited |
If you're using more than 1-2M tokens/month, self-hosting pays for itself.
FAQ
Can I run Ollama on 4GB RAM?
Barely. You can run Phi-3 Mini or Llama 3.2 1B, but larger models will crash or swap heavily.
Is GPU required for Ollama?
No! CPU works fine, just slower. 8 vCPU gives usable speeds for 7-8B models.
What's the best model for coding?
DeepSeek Coder or CodeLlama. Both available via ollama run deepseek-coder or ollama run codellama.
Can I fine-tune models on a VPS?
Yes, but you'll want a GPU VPS for that. CPU fine-tuning is painfully slow.
How do I update Ollama?
curl -fsSL https://ollama.ai/install.sh | sh
Same install command updates to latest version.
Recommended Setup
| Use Case | VPS | Cost | Model |
|---|---|---|---|
| Testing/Personal | Hetzner CPX21 | €8/mo | Phi-3 Mini |
| Daily Use | Hetzner CPX41 | €15/mo | Llama 3.2 8B |
| Fast Responses | Vultr GPU | $90/mo | Llama 3.2 8B |
| Heavy Workloads | Lambda A10 | $360/mo | Llama 3.1 70B |
For most users, Hetzner CPX41 at €15/mo running Llama 3.2 8B is the sweet spot. Fast enough for real use, cheap enough to leave running 24/7.
Ready to get started?
Get the best VPS hosting deal today. Hostinger offers 4GB RAM VPS starting at just $4.99/mo.
Get Hostinger VPS — $4.99/mo// up to 75% off + free domain included
// related topics
fordnox
Expert VPS reviews and hosting guides. We test every provider we recommend.
// last updated: February 8, 2026. Disclosure: This article may contain affiliate links.