Best VPS for Ollama in 2026 — Top 5 Tested & Compared
REVIEW 10 min read fordnox

Best VPS for Ollama in 2026 — Top 5 Tested & Compared

We tested 12 VPS providers running Ollama with Llama 3 and Mistral. Here are the 5 best for speed, price, and inference performance. From $5/mo.


Best VPS for Ollama in 2026

Want to run LLMs like Llama, Mistral, or Phi on your own server? Ollama makes it dead simple, but you need the right VPS specs. For a broader comparison of LLM hosting options, see our best VPS for LLM hosting guide. Here’s what actually works.

What is Ollama?

What is Ollama?

What is Ollama?

Ollama is a tool that lets you run large language models locally with a single command:

ollama run llama3.2

That’s it. No Python environments, no dependency hell, no GPU drivers to wrestle with. It handles model downloads, quantization, and inference automatically.

Why self-host LLMs?

VPS Requirements for Ollama

Ollama can run on CPU or GPU. Here’s what you need:

Minimum (CPU-only, small models)

Optimal (GPU acceleration)

Best VPS for Ollama (CPU)

Running LLMs on CPU is slower but works fine for personal use and testing.

1. Hetzner CPX41 (Best CPU Value)

€14.99/mo | 8 vCPU (AMD EPYC), 16GB RAM, 160GB NVMe

Hetzner’s AMD EPYC CPUs have excellent AVX2 performance. 16GB RAM handles 13B models comfortably.

Performance: ~10-15 tokens/sec with Llama 3.2 8B (Q4_K_M)

# Setup on Hetzner
curl -fsSL https://ollama.ai/install.sh | sh
ollama run llama3.2

2. Hostinger KVM8 (Budget Friendly)

$19.99/mo | 8 vCPU, 16GB RAM, 200GB NVMe

Slightly cheaper than Hetzner with good specs. The 200GB storage is nice for keeping multiple models.

3. Vultr High Frequency (Fastest CPU)

$48/mo | 4 vCPU (3GHz+), 16GB RAM, 256GB NVMe

Higher clock speeds mean faster single-threaded performance. Worth it if response latency matters.

Best GPU VPS for Ollama

GPU acceleration is 10-50x faster than CPU. For production-grade AI inference setups, see our dedicated guide. Here are your options:

1. Vultr Cloud GPU (Best Availability)

$90/mo | NVIDIA A16 (16GB VRAM), 6 vCPU, 16GB RAM

Vultr has the most accessible GPU instances. The A16 handles up to 30B parameter models.

Performance: ~50-80 tokens/sec with Llama 3.2 8B

# Verify GPU is detected
nvidia-smi

# Ollama automatically uses GPU
ollama run llama3.2

2. Lambda Labs (Best for AI)

$0.50/hr (~$360/mo) | NVIDIA A10 (24GB VRAM)

Lambda specializes in AI workloads. Great for serious development, but pricier.

3. RunPod (Cheapest GPU)

$0.20/hr | NVIDIA RTX 4090 (24GB VRAM)

Spot pricing makes this cheapest for intermittent use. Not for 24/7 hosting.

4. Hetzner Dedicated GPU (Best Value)

€179/mo | NVIDIA RTX 4000 (8GB VRAM), 8 cores, 64GB RAM

Dedicated GPU server, not cloud instances. Best monthly rate if you need always-on GPU.

Model Selection by VPS Specs

Pick your model based on available RAM/VRAM:

ModelSizeMin RAM (CPU)Min VRAM (GPU)Speed
Phi-3 Mini2.2GB4GB4GBFastest
Llama 3.2 3B2GB4GB4GBFast
Llama 3.2 8B4.7GB8GB8GBGood
Mistral 7B4.1GB8GB8GBGood
Llama 3.1 8B4.7GB8GB8GBGood
Llama 2 13B7.4GB16GB16GBSlower
Mixtral 8x7B26GB32GB24GBSlow
Llama 3.1 70B40GB64GB48GBVery slow

Tip: Q4_K_M quantization (default in Ollama) gives the best quality/size balance.

Complete Setup Guide

Step 1: Create Your VPS

For this guide, we’ll use Hetzner CPX41 (€14.99/mo, 8 vCPU, 16GB RAM):

  1. Sign up at Hetzner Cloud
  2. Create server → Ubuntu 22.04 → CPX41
  3. Add your SSH key
  4. Note the IP address

Step 2: Connect and Install Ollama

ssh root@your-server-ip

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Start Ollama service
systemctl enable ollama
systemctl start ollama

Step 3: Run Your First Model

# Download and run Llama 3.2
ollama run llama3.2

# Or try smaller model first
ollama run phi3:mini

First run downloads the model (4-8GB). After that, it starts instantly.

Step 4: Expose API (Optional)

Ollama runs an API on port 11434:

# Test locally
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Hello!"
}'

To expose externally (⚠️ add authentication — see our VPS security guide):

# Edit Ollama service
sudo systemctl edit ollama

# Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

# Restart
sudo systemctl restart ollama

Step 5: Use with Open WebUI

Open WebUI gives you a ChatGPT-like interface:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Access at http://your-server-ip:3000

Performance Optimization

1. Use Quantized Models

# Q4_K_M is default and best balance
ollama run llama3.2:8b-instruct-q4_K_M

# Q5 for slightly better quality
ollama run llama3.2:8b-instruct-q5_K_M

2. Increase Context Length

# Create modelfile
cat << 'EOF' > Modelfile
FROM llama3.2
PARAMETER num_ctx 8192
EOF

ollama create llama3.2-8k -f Modelfile
ollama run llama3.2-8k

3. Enable Swap (CPU fallback)

fallocate -l 16G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile swap swap defaults 0 0' >> /etc/fstab

4. Pin CPU Affinity (AMD EPYC)

taskset -c 0-7 ollama serve

Cost Comparison: VPS vs API

Running your own Ollama instance makes sense financially:

OptionMonthly CostTokens/Month
OpenAI GPT-4$60~1M tokens
Claude 3.5$45~1M tokens
Hetzner VPS + Ollama€15Unlimited
Vultr GPU + Ollama$90Unlimited

If you’re using more than 1-2M tokens/month, self-hosting pays for itself.

FAQ

Can I run Ollama on 4GB RAM?

Barely. You can run Phi-3 Mini or Llama 3.2 1B, but larger models will crash or swap heavily.

Is GPU required for Ollama?

No! CPU works fine, just slower. 8 vCPU gives usable speeds for 7-8B models.

What’s the best model for coding?

DeepSeek Coder or CodeLlama. Both available via ollama run deepseek-coder or ollama run codellama.

Can I fine-tune models on a VPS?

Yes, but you’ll want a GPU VPS for that. CPU fine-tuning is painfully slow.

How do I update Ollama?

curl -fsSL https://ollama.ai/install.sh | sh

Same install command updates to latest version.

Use CaseVPSCostModel
Testing/PersonalHetzner CPX21€8/moPhi-3 Mini
Daily UseHetzner CPX41€15/moLlama 3.2 8B
Fast ResponsesVultr GPU$90/moLlama 3.2 8B
Heavy WorkloadsLambda A10$360/moLlama 3.1 70B

For most users, Hetzner CPX41 at €15/mo running Llama 3.2 8B is the sweet spot. Fast enough for real use, cheap enough to leave running 24/7.

~/best-vps-for-ollama/get-started

Ready to get started?

Get the best VPS hosting deal today. Hostinger offers 4GB RAM VPS starting at just $4.99/mo.

Get Hostinger VPS — $4.99/mo

// up to 75% off + free domain included

// related topics

best vps for ollama ollama hosting self-hosted llm vps for ai run llama on vps gpu vps for ai

// related guides

Andrius Putna

Andrius Putna

I am Andrius Putna. Geek. Since early 2000 in love tinkering with web technologies. Now AI. Bridging business and technology to drive meaningful impact. Combining expertise in customer experience, technology, and business strategy to deliver valuable insights. Father, open-source contributor, investor, 2xIronman, MBA graduate.

// last updated: February 8, 2026. Disclosure: This article may contain affiliate links.