Best VPS for Whisper 2026: Self-Host Speech-to-Text
Find the best VPS for running OpenAI Whisper. Compare GPU and CPU options for self-hosted speech-to-text transcription on your own server.
Best VPS for Whisper in 2026
Want to transcribe audio without sending it to third-party APIs? OpenAI’s Whisper runs entirely on your own server — giving you unlimited, private speech-to-text. Here’s what VPS specs you actually need.
What is Whisper?
What is Whisper?
Whisper is OpenAI’s open-source speech recognition model. It handles:
- Transcription — Audio to text in 99+ languages
- Translation — Translate any language to English
- Subtitle generation — Timestamped output for video
- Speaker diarization — With extensions like WhisperX
whisper audio.mp3 --model medium --language en
Why self-host Whisper?
- Privacy — Audio never leaves your server
- No per-minute costs — OpenAI charges $0.006/min, it adds up fast
- No file size limits — Process hours-long recordings
- Batch processing — Transcribe hundreds of files overnight
- Customization — Use faster-whisper, WhisperX, or fine-tuned models
VPS Requirements for Whisper
Whisper’s resource needs depend on model size and whether you use GPU acceleration.
Minimum (CPU-only, small model)
- CPU: 4+ cores
- RAM: 4GB
- Storage: 10GB SSD
Recommended (CPU, medium model)
- CPU: 8+ cores (AVX2 support)
- RAM: 8GB
- Storage: 20GB NVMe
Optimal (GPU acceleration)
- GPU: NVIDIA with 6GB+ VRAM
- RAM: 8GB+ system RAM
- Storage: 30GB+ NVMe
Whisper Model Sizes
Pick based on your available resources:
| Model | Size | Min VRAM | Min RAM (CPU) | Relative Speed | Accuracy |
|---|---|---|---|---|---|
| tiny | 75MB | 1GB | 2GB | 32x | Basic |
| base | 142MB | 1GB | 2GB | 16x | Good |
| small | 466MB | 2GB | 4GB | 6x | Better |
| medium | 1.5GB | 5GB | 8GB | 2x | Great |
| large-v3 | 3.1GB | 10GB | 16GB | 1x | Best |
Tip: The medium model hits the sweet spot — 95%+ accuracy with reasonable speed. Use large-v3 only when accuracy is critical.
Best VPS for Whisper (CPU)
CPU transcription works fine for batch jobs and occasional use. Expect roughly real-time speed with small model (1 hour audio ≈ 1 hour processing).
1. Hetzner CPX41 (Best Value)
€14.99/mo | 8 vCPU (AMD EPYC), 16GB RAM, 160GB NVMe
Handles the medium model comfortably. AMD EPYC processors have strong AVX2 performance which Whisper relies on heavily.
Performance: ~1x real-time with medium model, ~3x with small
2. Hostinger KVM8 (Budget Pick)
$19.99/mo | 8 vCPU, 16GB RAM, 200GB NVMe
Good specs at a fair price. The 200GB storage is handy if you’re processing lots of audio files.
3. Contabo VPS XL (Most RAM)
€13.99/mo | 8 vCPU, 30GB RAM, 400GB SSD
If you want to run large-v3 on CPU, you need 16GB+ RAM. Contabo’s generous memory allocation makes this possible at budget pricing.
Best GPU VPS for Whisper
GPU acceleration makes Whisper 10-30x faster. A 1-hour podcast transcribes in 2-5 minutes.
1. Vultr Cloud GPU (Best Availability)
$90/mo | NVIDIA A16 (16GB VRAM), 6 vCPU, 16GB RAM
Runs every Whisper model including large-v3. Always available — no spot instance headaches.
Performance: ~10-15x real-time with large-v3
2. Hetzner Dedicated GPU (Best Monthly Rate)
€179/mo | NVIDIA RTX 4000 (8GB VRAM), 8 cores, 64GB RAM
Best value for 24/7 transcription workloads. Runs medium and small models at blazing speed.
3. RunPod (Cheapest for Batch Jobs)
$0.20/hr | NVIDIA RTX 4090 (24GB VRAM)
Spin up when you have files to process, shut down when done. Perfect for occasional bulk transcription.
4. Lambda Labs (Heavy Workloads)
$0.50/hr (~$360/mo) | NVIDIA A10 (24GB VRAM)
For production transcription pipelines processing thousands of hours monthly.
Complete Setup Guide
Step 1: Create Your VPS
We’ll use Hetzner CPX41 for this guide:
- Sign up at Hetzner Cloud
- Create server → Ubuntu 22.04 → CPX41
- Add your SSH key
- Note the IP address
Step 2: Install Whisper
ssh root@your-server-ip
# Install dependencies
apt update && apt install -y python3-pip ffmpeg
# Install Whisper
pip3 install openai-whisper
Step 3: Transcribe Your First File
# Basic transcription
whisper recording.mp3 --model medium
# With language detection
whisper recording.mp3 --model medium --task transcribe
# Translate to English
whisper foreign_audio.mp3 --model medium --task translate
# Output subtitles
whisper video.mp4 --model medium --output_format srt
Step 4: Use faster-whisper (Recommended)
faster-whisper uses CTranslate2 and is 4x faster than standard Whisper with lower memory usage:
pip3 install faster-whisper
python3 << 'EOF'
from faster_whisper import WhisperModel
model = WhisperModel("medium", device="cpu", compute_type="int8")
segments, info = model.transcribe("recording.mp3")
print(f"Detected language: {info.language} ({info.language_probability:.0%})")
for segment in segments:
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
EOF
Why faster-whisper?
- 4x faster on CPU, 2x faster on GPU
- Uses less memory (int8 quantization)
- Same accuracy as original Whisper
- Drop-in replacement
Step 5: Set Up as API Service
Create a simple transcription API with FastAPI:
pip3 install fastapi uvicorn python-multipart faster-whisper
# transcription_api.py
from fastapi import FastAPI, UploadFile
from faster_whisper import WhisperModel
import tempfile, os
app = FastAPI()
model = WhisperModel("medium", device="cpu", compute_type="int8")
@app.post("/transcribe")
async def transcribe(file: UploadFile):
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp:
tmp.write(await file.read())
tmp_path = tmp.name
segments, info = model.transcribe(tmp_path)
text = " ".join(s.text for s in segments)
os.unlink(tmp_path)
return {
"language": info.language,
"text": text.strip()
}
uvicorn transcription_api:app --host 0.0.0.0 --port 8000
Send files to your API:
curl -X POST http://your-server-ip:8000/transcribe \
-F "file=@recording.mp3"
Step 6: Docker Setup (Alternative)
docker run -d -p 8000:8000 \
--name whisper \
-v whisper-models:/root/.cache \
onerahmet/openai-whisper-asr-webservice:latest
This gives you a ready-made REST API with Swagger docs at http://your-server-ip:8000/docs.
Performance Optimization
1. Use faster-whisper with int8
# CPU — int8 quantization (fastest)
model = WhisperModel("medium", device="cpu", compute_type="int8")
# GPU — float16 (best quality/speed balance)
model = WhisperModel("medium", device="cuda", compute_type="float16")
2. Batch Processing Script
#!/bin/bash
# transcribe_all.sh — process all audio files in a directory
INPUT_DIR="./audio"
OUTPUT_DIR="./transcripts"
mkdir -p "$OUTPUT_DIR"
for file in "$INPUT_DIR"/*.{mp3,wav,m4a,flac}; do
[ -f "$file" ] || continue
filename=$(basename "$file" | sed 's/\.[^.]*$//')
echo "Processing: $file"
whisper "$file" --model medium --output_dir "$OUTPUT_DIR" --output_format txt
done
echo "Done! Transcripts in $OUTPUT_DIR"
3. Enable Swap for Large Models
fallocate -l 8G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile swap swap defaults 0 0' >> /etc/fstab
4. Use VAD (Voice Activity Detection)
Skip silence to speed up processing:
segments, info = model.transcribe(
"recording.mp3",
vad_filter=True,
vad_parameters=dict(min_silence_duration_ms=500)
)
This can speed up transcription by 2-3x on recordings with lots of silence or pauses.
Cost Comparison: VPS vs APIs
| Option | Monthly Cost | Hours of Audio |
|---|---|---|
| OpenAI Whisper API | $0.006/min | 100 hrs = $36 |
| Google Speech-to-Text | $0.006/min | 100 hrs = $36 |
| AWS Transcribe | $0.024/min | 100 hrs = $144 |
| Hetzner VPS + Whisper | €15/mo | Unlimited |
| Vultr GPU + Whisper | $90/mo | Unlimited |
Self-hosting breaks even at roughly 40 hours/month on Hetzner, or 250 hours/month on Vultr GPU. After that, every hour is free.
Use Cases
Podcast Transcription
Run large-v3 for best accuracy. A 1-hour episode takes ~5 min on GPU, ~1 hour on CPU.
Meeting Notes
Combine Whisper with WhisperX for speaker diarization:
pip install whisperx
python3 -c "
import whisperx
model = whisperx.load_model('medium', 'cpu')
result = model.transcribe('meeting.mp3')
# Add speaker labels
diarize_model = whisperx.DiarizationPipeline()
result = whisperx.assign_word_speakers(diarize_model('meeting.mp3'), result)
"
Subtitle Generation
whisper video.mp4 --model medium --output_format srt --word_timestamps True
Voice Note Processing
Build a Telegram bot or webhook that auto-transcribes voice messages.
FAQ
Can I run Whisper on 2GB RAM?
Yes, with the tiny or base model. Accuracy is lower but fine for clear English audio.
Is GPU required?
No. CPU works perfectly for batch processing where speed isn’t critical. Use faster-whisper with int8 for best CPU performance.
Which model should I use?
medium for most use cases. large-v3 if accuracy is critical (legal, medical). small if speed matters more than perfect accuracy.
Can Whisper handle multiple languages?
Yes. It auto-detects language and can transcribe 99+ languages. Translation to English is built in.
How accurate is Whisper?
The large-v3 model approaches human-level accuracy (~95-98% word error rate on clean audio). medium is close behind at ~93-96%.
Recommended Setup
| Use Case | VPS | Cost | Model | Speed |
|---|---|---|---|---|
| Occasional Use | Hetzner CPX21 | €8/mo | small | ~3x real-time |
| Daily Transcription | Hetzner CPX41 | €15/mo | medium | ~1x real-time |
| Fast Processing | Vultr GPU | $90/mo | large-v3 | ~15x real-time |
| Bulk/Production | Lambda A10 | $360/mo | large-v3 | ~20x real-time |
For most users, Hetzner CPX41 at €15/mo with faster-whisper and the medium model is the sweet spot. Accurate enough for real work, affordable enough to leave running.
Ready to get started?
Get the best VPS hosting deal today. Hostinger offers 4GB RAM VPS starting at just $4.99/mo.
Get Hostinger VPS — $4.99/mo// up to 75% off + free domain included
// related topics
// related guides
AWS EC2 Alternatives 2026: Cheaper, Simpler VPS Hosting
Best AWS EC2 alternatives for cheaper VPS hosting. Compare Hetzner, Vultr, DigitalOcean, and more — save 70%+ with simpler billing.
reviewCheapest VPS Hosting 2026 — Best Budget Servers From $2.50
We compared 10 budget VPS providers on price, specs, and support. Here are the cheapest worth using — from $2.50/mo with real performance data.
reviewBest GPU VPS in 2026 — Cheapest NVIDIA Servers Compared
Rent GPU servers from $0.50/hr. We compare 8 GPU VPS providers for AI training, inference, and rendering — NVIDIA A100, H100, and RTX options.
reviewBest macOS VPS for iOS Development in 2026
Need a macOS VPS for iOS app development? We review the best providers offering macOS virtual servers for Xcode, Swift, and App Store publishing.
Andrius Putna
I am Andrius Putna. Geek. Since early 2000 in love tinkering with web technologies. Now AI. Bridging business and technology to drive meaningful impact. Combining expertise in customer experience, technology, and business strategy to deliver valuable insights. Father, open-source contributor, investor, 2xIronman, MBA graduate.
// last updated: March 6, 2026. Disclosure: This article may contain affiliate links.