Deploy LocalAI with Dokploy

Dokploy is an open-source server management platform that simplifies deploying Docker Compose applications on your VPS. It handles reverse proxy configuration, SSL certificates, and deployment management — making it convenient to host your own OpenAI-compatible API server with LocalAI.

This guide walks you through deploying LocalAI with persistent model storage, API access, and automatic HTTPS. LocalAI provides an OpenAI-compatible API that runs LLMs, image generation, audio transcription, and embeddings locally.

Prerequisites

A VPS with at least 4 vCPUs, 8 GB RAM, and 30 GB storage (model files are large — 4–16 GB each)
Dokploy installed and running on your server (installation docs)
A domain name (e.g., ai.yourdomain.com) with DNS A record pointing to your server's IP
For GPU acceleration: NVIDIA GPU with Container Toolkit installed (optional but recommended)

Docker Compose Configuration

Create a new Compose project in Dokploy and paste the following configuration:

version: "3.8"

services:
  localai:
    image: localai/localai:latest-cpu
    restart: unless-stopped
    ports:
      - "8080:8080"
    environment:
      - THREADS=${THREADS:-4}
      - CONTEXT_SIZE=${CONTEXT_SIZE:-512}
      - MODELS_PATH=/models
      - DEBUG=${DEBUG:-false}
      - CORS=true
      - CORS_ALLOW_ORIGINS=*
    volumes:
      - ../files/localai-models:/models
      - ../files/localai-images:/tmp/generated/images
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:8080/readyz || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 120s

Note: This configuration uses the CPU-only image (latest-cpu). For NVIDIA GPU acceleration, change the image to localai/localai:latest-gpu-nvidia-cuda-12 and add deploy: resources: reservations: devices: [{driver: nvidia, count: all, capabilities: [gpu]}] to the service. GPU inference is significantly faster for large language models.

Environment Variables

Set these in Dokploy's Environment tab for your compose project:

Variable	Purpose	Example
`THREADS`	Number of CPU threads for inference	`4`
`CONTEXT_SIZE`	Default context window size (tokens)	`2048`
`DEBUG`	Enable debug logging	`false`

In Dokploy, environment variables are set via the Environment editor in the project settings. Do not create a .env file manually — Dokploy manages this for you. Increase THREADS to match your server's CPU count for better performance. Higher CONTEXT_SIZE uses more RAM.

Volumes & Data Persistence

This setup uses Dokploy's ../files convention for bind-mounted volumes:

../files/localai-models — Downloaded model files (GGUF, GGML formats). These are large files (4–16 GB each)
../files/localai-images — Generated images from image generation models

The ../files path is relative to the compose file inside Dokploy's project directory. This ensures your data persists across redeployments. Avoid using absolute paths because Dokploy may clean them during redeployment.

To download models, you can use LocalAI's model gallery API after deployment: curl https://ai.yourdomain.com/models/apply -H "Content-Type: application/json" -d '{"id":"huggingface@thebloke__mistral-7b-instruct-v0.2-gguf__mistral-7b-instruct-v0.2.Q4_K_M.gguf"}'. Alternatively, manually place GGUF model files in the ../files/localai-models directory on your server.

Domain & SSL Setup

In your Dokploy project, navigate to the Domains tab
Click Add Domain and enter your domain (e.g., ai.yourdomain.com)
Set the container port to 8080
Enable HTTPS — Dokploy automatically provisions a Let's Encrypt SSL certificate
Save and wait for the certificate to be issued (usually under a minute)

Dokploy's built-in Traefik reverse proxy handles TLS termination and routes traffic to your LocalAI container. HTTPS is important if you plan to access the API over the internet.

Verifying the Deployment

In Dokploy, go to your project's Deployments tab and click Deploy
Watch the build logs — the localai container should start (initial startup may take 1–2 minutes)
Check the Logs tab for the localai service. Look for: LocalAI API is ready
Test the API: curl https://ai.yourdomain.com/v1/models — this lists available models
After loading a model, test chat completion: curl https://ai.yourdomain.com/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"your-model-name","messages":[{"role":"user","content":"Hello"}]}'

Troubleshooting

Container takes a long time to start LocalAI loads models into memory on startup. Large models (7B+ parameters) can take several minutes to load, especially on CPU. The health check has a 120-second start period to account for this. Check logs for model loading progress.

Out of memory errors Each loaded model consumes RAM proportional to its size. A 7B parameter Q4 quantized model uses roughly 4–6 GB RAM. Ensure your server has enough memory for all loaded models plus system overhead. Reduce the number of simultaneously loaded models if needed.

Inference is very slow CPU inference is inherently slower than GPU inference. For better performance: increase THREADS to match your CPU core count, use smaller quantized models (Q4_K_M instead of Q8), reduce CONTEXT_SIZE, or switch to a GPU-enabled image with an NVIDIA GPU.

SSL certificate not issuing Ensure your domain's DNS A record points to your server's IP and has propagated. Dokploy uses Let's Encrypt HTTP-01 challenges, so port 80 must be accessible. Check Traefik logs in Dokploy for certificate-related errors.

Learn more about LocalAI in our complete overview.

Need a VPS? Hostinger VPS starts at $4.99/mo — perfect for running LocalAI.

For more on Docker Compose deployments in Dokploy, see the Dokploy Docker Compose documentation.

App data sourced from selfh.st open-source directory.

Deploy LocalAI with Dokploy: Docker Compose Setup Guide