Best VPS for Ollama in 2026

Want to run LLMs like Llama, Mistral, or Phi on your own server? Ollama makes it dead simple, but you need the right VPS specs. Here's what actually works.

What is Ollama?

Ollama is a tool that lets you run large language models locally with a single command:

ollama run llama3.2

That's it. No Python environments, no dependency hell, no GPU drivers to wrestle with. It handles model downloads, quantization, and inference automatically.

Why self-host LLMs?

Privacy — Your prompts never leave your server
No rate limits — Use as much as you want
No API costs — One-time VPS cost vs per-token pricing
Customization — Fine-tune, modify, experiment
Offline capable — Works without internet after model download

VPS Requirements for Ollama

Ollama can run on CPU or GPU. Here's what you need:

Minimum (CPU-only, small models)

CPU: 4+ cores (AVX2 support required)
RAM: 8GB (for 7B models)
Storage: 20GB+ SSD (models are 4-8GB each)

Recommended (CPU, medium models)

CPU: 8+ cores
RAM: 16GB (for 13B models)
Storage: 50GB+ NVMe

Optimal (GPU acceleration)

GPU: NVIDIA with 8GB+ VRAM
RAM: 16GB+ system RAM
Storage: 100GB+ NVMe

Best VPS for Ollama (CPU)

Running LLMs on CPU is slower but works fine for personal use and testing.

1. Hetzner CPX41 (Best CPU Value)

€14.99/mo | 8 vCPU (AMD EPYC), 16GB RAM, 160GB NVMe

Hetzner's AMD EPYC CPUs have excellent AVX2 performance. 16GB RAM handles 13B models comfortably.

Performance: ~10-15 tokens/sec with Llama 3.2 8B (Q4_K_M)

# Setup on Hetzner
curl -fsSL https://ollama.ai/install.sh | sh
ollama run llama3.2

2. Hostinger KVM8 (Budget Friendly)

$19.99/mo | 8 vCPU, 16GB RAM, 200GB NVMe

Slightly cheaper than Hetzner with good specs. The 200GB storage is nice for keeping multiple models.

3. Vultr High Frequency (Fastest CPU)

$48/mo | 4 vCPU (3GHz+), 16GB RAM, 256GB NVMe

Higher clock speeds mean faster single-threaded performance. Worth it if response latency matters.

Best GPU VPS for Ollama

GPU acceleration is 10-50x faster than CPU. Here are your options:

1. Vultr Cloud GPU (Best Availability)

$90/mo | NVIDIA A16 (16GB VRAM), 6 vCPU, 16GB RAM

Vultr has the most accessible GPU instances. The A16 handles up to 30B parameter models.

Performance: ~50-80 tokens/sec with Llama 3.2 8B

# Verify GPU is detected
nvidia-smi

# Ollama automatically uses GPU
ollama run llama3.2

2. Lambda Labs (Best for AI)

$0.50/hr (~$360/mo) | NVIDIA A10 (24GB VRAM)

Lambda specializes in AI workloads. Great for serious development, but pricier.

3. RunPod (Cheapest GPU)

$0.20/hr | NVIDIA RTX 4090 (24GB VRAM)

Spot pricing makes this cheapest for intermittent use. Not for 24/7 hosting.

4. Hetzner Dedicated GPU (Best Value)

€179/mo | NVIDIA RTX 4000 (8GB VRAM), 8 cores, 64GB RAM

Dedicated GPU server, not cloud instances. Best monthly rate if you need always-on GPU.

Model Selection by VPS Specs

Pick your model based on available RAM/VRAM:

Model	Size	Min RAM (CPU)	Min VRAM (GPU)	Speed
Phi-3 Mini	2.2GB	4GB	4GB	Fastest
Llama 3.2 3B	2GB	4GB	4GB	Fast
Llama 3.2 8B	4.7GB	8GB	8GB	Good
Mistral 7B	4.1GB	8GB	8GB	Good
Llama 3.1 8B	4.7GB	8GB	8GB	Good
Llama 2 13B	7.4GB	16GB	16GB	Slower
Mixtral 8x7B	26GB	32GB	24GB	Slow
Llama 3.1 70B	40GB	64GB	48GB	Very slow

Tip: Q4_K_M quantization (default in Ollama) gives the best quality/size balance.

Complete Setup Guide

Step 1: Create Your VPS

For this guide, we'll use Hetzner CPX41 (€14.99/mo, 8 vCPU, 16GB RAM):

Sign up at Hetzner Cloud
Create server → Ubuntu 22.04 → CPX41
Add your SSH key
Note the IP address

Step 2: Connect and Install Ollama

ssh root@your-server-ip

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Start Ollama service
systemctl enable ollama
systemctl start ollama

Step 3: Run Your First Model

# Download and run Llama 3.2
ollama run llama3.2

# Or try smaller model first
ollama run phi3:mini

First run downloads the model (4-8GB). After that, it starts instantly.

Step 4: Expose API (Optional)

Ollama runs an API on port 11434:

# Test locally
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Hello!"
}'

To expose externally (⚠️ add authentication):

# Edit Ollama service
sudo systemctl edit ollama

# Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

# Restart
sudo systemctl restart ollama

Step 5: Use with Open WebUI

Open WebUI gives you a ChatGPT-like interface:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Access at http://your-server-ip:3000

Performance Optimization

1. Use Quantized Models

# Q4_K_M is default and best balance
ollama run llama3.2:8b-instruct-q4_K_M

# Q5 for slightly better quality
ollama run llama3.2:8b-instruct-q5_K_M

2. Increase Context Length

# Create modelfile
cat << 'EOF' > Modelfile
FROM llama3.2
PARAMETER num_ctx 8192
EOF

ollama create llama3.2-8k -f Modelfile
ollama run llama3.2-8k

3. Enable Swap (CPU fallback)

fallocate -l 16G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile swap swap defaults 0 0' >> /etc/fstab

4. Pin CPU Affinity (AMD EPYC)

taskset -c 0-7 ollama serve

Cost Comparison: VPS vs API

Running your own Ollama instance makes sense financially:

Option	Monthly Cost	Tokens/Month
OpenAI GPT-4	$60	~1M tokens
Claude 3.5	$45	~1M tokens
Hetzner VPS + Ollama	€15	Unlimited
Vultr GPU + Ollama	$90	Unlimited

If you're using more than 1-2M tokens/month, self-hosting pays for itself.

FAQ

Can I run Ollama on 4GB RAM?

Barely. You can run Phi-3 Mini or Llama 3.2 1B, but larger models will crash or swap heavily.

Is GPU required for Ollama?

No! CPU works fine, just slower. 8 vCPU gives usable speeds for 7-8B models.

What's the best model for coding?

DeepSeek Coder or CodeLlama. Both available via ollama run deepseek-coder or ollama run codellama.

Can I fine-tune models on a VPS?

Yes, but you'll want a GPU VPS for that. CPU fine-tuning is painfully slow.

How do I update Ollama?

curl -fsSL https://ollama.ai/install.sh | sh

Same install command updates to latest version.

Recommended Setup

Use Case	VPS	Cost	Model
Testing/Personal	Hetzner CPX21	€8/mo	Phi-3 Mini
Daily Use	Hetzner CPX41	€15/mo	Llama 3.2 8B
Fast Responses	Vultr GPU	$90/mo	Llama 3.2 8B
Heavy Workloads	Lambda A10	$360/mo	Llama 3.1 70B

For most users, Hetzner CPX41 at €15/mo running Llama 3.2 8B is the sweet spot. Fast enough for real use, cheap enough to leave running 24/7.

Best VPS for Ollama 2026: Run LLMs on Your Own Server