Best VPS for Ollama 2026: Run LLMs on Your Own Server
REVIEW 10 min read fordnox

Best VPS for Ollama 2026: Run LLMs on Your Own Server

Find the best VPS for running Ollama and self-hosted LLMs. Compare GPU VPS options, CPU requirements, and get your AI models running in minutes.


Best VPS for Ollama in 2026

Want to run LLMs like Llama, Mistral, or Phi on your own server? Ollama makes it dead simple, but you need the right VPS specs. Here's what actually works.

What is Ollama?

Ollama is a tool that lets you run large language models locally with a single command:

ollama run llama3.2

That's it. No Python environments, no dependency hell, no GPU drivers to wrestle with. It handles model downloads, quantization, and inference automatically.

Why self-host LLMs?

VPS Requirements for Ollama

Ollama can run on CPU or GPU. Here's what you need:

Minimum (CPU-only, small models)

Recommended (CPU, medium models)

Optimal (GPU acceleration)

Best VPS for Ollama (CPU)

Running LLMs on CPU is slower but works fine for personal use and testing.

1. Hetzner CPX41 (Best CPU Value)

€14.99/mo | 8 vCPU (AMD EPYC), 16GB RAM, 160GB NVMe

Hetzner's AMD EPYC CPUs have excellent AVX2 performance. 16GB RAM handles 13B models comfortably.

Performance: ~10-15 tokens/sec with Llama 3.2 8B (Q4_K_M)

# Setup on Hetzner
curl -fsSL https://ollama.ai/install.sh | sh
ollama run llama3.2

2. Hostinger KVM8 (Budget Friendly)

$19.99/mo | 8 vCPU, 16GB RAM, 200GB NVMe

Slightly cheaper than Hetzner with good specs. The 200GB storage is nice for keeping multiple models.

3. Vultr High Frequency (Fastest CPU)

$48/mo | 4 vCPU (3GHz+), 16GB RAM, 256GB NVMe

Higher clock speeds mean faster single-threaded performance. Worth it if response latency matters.

Best GPU VPS for Ollama

GPU acceleration is 10-50x faster than CPU. Here are your options:

1. Vultr Cloud GPU (Best Availability)

$90/mo | NVIDIA A16 (16GB VRAM), 6 vCPU, 16GB RAM

Vultr has the most accessible GPU instances. The A16 handles up to 30B parameter models.

Performance: ~50-80 tokens/sec with Llama 3.2 8B

# Verify GPU is detected
nvidia-smi

# Ollama automatically uses GPU
ollama run llama3.2

2. Lambda Labs (Best for AI)

$0.50/hr (~$360/mo) | NVIDIA A10 (24GB VRAM)

Lambda specializes in AI workloads. Great for serious development, but pricier.

3. RunPod (Cheapest GPU)

$0.20/hr | NVIDIA RTX 4090 (24GB VRAM)

Spot pricing makes this cheapest for intermittent use. Not for 24/7 hosting.

4. Hetzner Dedicated GPU (Best Value)

€179/mo | NVIDIA RTX 4000 (8GB VRAM), 8 cores, 64GB RAM

Dedicated GPU server, not cloud instances. Best monthly rate if you need always-on GPU.

Model Selection by VPS Specs

Pick your model based on available RAM/VRAM:

Model Size Min RAM (CPU) Min VRAM (GPU) Speed
Phi-3 Mini 2.2GB 4GB 4GB Fastest
Llama 3.2 3B 2GB 4GB 4GB Fast
Llama 3.2 8B 4.7GB 8GB 8GB Good
Mistral 7B 4.1GB 8GB 8GB Good
Llama 3.1 8B 4.7GB 8GB 8GB Good
Llama 2 13B 7.4GB 16GB 16GB Slower
Mixtral 8x7B 26GB 32GB 24GB Slow
Llama 3.1 70B 40GB 64GB 48GB Very slow

Tip: Q4_K_M quantization (default in Ollama) gives the best quality/size balance.

Complete Setup Guide

Step 1: Create Your VPS

For this guide, we'll use Hetzner CPX41 (€14.99/mo, 8 vCPU, 16GB RAM):

  1. Sign up at Hetzner Cloud
  2. Create server → Ubuntu 22.04 → CPX41
  3. Add your SSH key
  4. Note the IP address

Step 2: Connect and Install Ollama

ssh root@your-server-ip

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Start Ollama service
systemctl enable ollama
systemctl start ollama

Step 3: Run Your First Model

# Download and run Llama 3.2
ollama run llama3.2

# Or try smaller model first
ollama run phi3:mini

First run downloads the model (4-8GB). After that, it starts instantly.

Step 4: Expose API (Optional)

Ollama runs an API on port 11434:

# Test locally
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Hello!"
}'

To expose externally (⚠️ add authentication):

# Edit Ollama service
sudo systemctl edit ollama

# Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

# Restart
sudo systemctl restart ollama

Step 5: Use with Open WebUI

Open WebUI gives you a ChatGPT-like interface:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Access at http://your-server-ip:3000

Performance Optimization

1. Use Quantized Models

# Q4_K_M is default and best balance
ollama run llama3.2:8b-instruct-q4_K_M

# Q5 for slightly better quality
ollama run llama3.2:8b-instruct-q5_K_M

2. Increase Context Length

# Create modelfile
cat << 'EOF' > Modelfile
FROM llama3.2
PARAMETER num_ctx 8192
EOF

ollama create llama3.2-8k -f Modelfile
ollama run llama3.2-8k

3. Enable Swap (CPU fallback)

fallocate -l 16G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile swap swap defaults 0 0' >> /etc/fstab

4. Pin CPU Affinity (AMD EPYC)

taskset -c 0-7 ollama serve

Cost Comparison: VPS vs API

Running your own Ollama instance makes sense financially:

Option Monthly Cost Tokens/Month
OpenAI GPT-4 $60 ~1M tokens
Claude 3.5 $45 ~1M tokens
Hetzner VPS + Ollama €15 Unlimited
Vultr GPU + Ollama $90 Unlimited

If you're using more than 1-2M tokens/month, self-hosting pays for itself.

FAQ

Can I run Ollama on 4GB RAM?

Barely. You can run Phi-3 Mini or Llama 3.2 1B, but larger models will crash or swap heavily.

Is GPU required for Ollama?

No! CPU works fine, just slower. 8 vCPU gives usable speeds for 7-8B models.

What's the best model for coding?

DeepSeek Coder or CodeLlama. Both available via ollama run deepseek-coder or ollama run codellama.

Can I fine-tune models on a VPS?

Yes, but you'll want a GPU VPS for that. CPU fine-tuning is painfully slow.

How do I update Ollama?

curl -fsSL https://ollama.ai/install.sh | sh

Same install command updates to latest version.

Recommended Setup

Use Case VPS Cost Model
Testing/Personal Hetzner CPX21 €8/mo Phi-3 Mini
Daily Use Hetzner CPX41 €15/mo Llama 3.2 8B
Fast Responses Vultr GPU $90/mo Llama 3.2 8B
Heavy Workloads Lambda A10 $360/mo Llama 3.1 70B

For most users, Hetzner CPX41 at €15/mo running Llama 3.2 8B is the sweet spot. Fast enough for real use, cheap enough to leave running 24/7.

~/best-vps-for-ollama/get-started

Ready to get started?

Get the best VPS hosting deal today. Hostinger offers 4GB RAM VPS starting at just $4.99/mo.

Get Hostinger VPS — $4.99/mo

// up to 75% off + free domain included

// related topics

best vps for ollama ollama hosting self-hosted llm vps for ai run llama on vps gpu vps for ai

fordnox

Expert VPS reviews and hosting guides. We test every provider we recommend.

// last updated: February 8, 2026. Disclosure: This article may contain affiliate links.