DOKPLOY-GUIDE 8 min read fordnox

Deploy LocalAI with Dokploy: Docker Compose Setup Guide

Step-by-step guide to deploying LocalAI model server on your VPS using Dokploy and Docker Compose. Includes model management, API configuration, and GPU support.


Deploy LocalAI with Dokploy

Dokploy is an open-source server management platform that simplifies deploying Docker Compose applications on your VPS. It handles reverse proxy configuration, SSL certificates, and deployment management — making it convenient to host your own OpenAI-compatible API server with LocalAI.

This guide walks you through deploying LocalAI with persistent model storage, API access, and automatic HTTPS. LocalAI provides an OpenAI-compatible API that runs LLMs, image generation, audio transcription, and embeddings locally.

Prerequisites

Docker Compose Configuration

Create a new Compose project in Dokploy and paste the following configuration:

version: "3.8"

services:
  localai:
    image: localai/localai:latest-cpu
    restart: unless-stopped
    ports:
      - "8080:8080"
    environment:
      - THREADS=${THREADS:-4}
      - CONTEXT_SIZE=${CONTEXT_SIZE:-512}
      - MODELS_PATH=/models
      - DEBUG=${DEBUG:-false}
      - CORS=true
      - CORS_ALLOW_ORIGINS=*
    volumes:
      - ../files/localai-models:/models
      - ../files/localai-images:/tmp/generated/images
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:8080/readyz || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 120s

Note: This configuration uses the CPU-only image (latest-cpu). For NVIDIA GPU acceleration, change the image to localai/localai:latest-gpu-nvidia-cuda-12 and add deploy: resources: reservations: devices: [{driver: nvidia, count: all, capabilities: [gpu]}] to the service. GPU inference is significantly faster for large language models.

Environment Variables

Set these in Dokploy's Environment tab for your compose project:

Variable Purpose Example
THREADS Number of CPU threads for inference 4
CONTEXT_SIZE Default context window size (tokens) 2048
DEBUG Enable debug logging false

In Dokploy, environment variables are set via the Environment editor in the project settings. Do not create a .env file manually — Dokploy manages this for you. Increase THREADS to match your server's CPU count for better performance. Higher CONTEXT_SIZE uses more RAM.

Volumes & Data Persistence

This setup uses Dokploy's ../files convention for bind-mounted volumes:

The ../files path is relative to the compose file inside Dokploy's project directory. This ensures your data persists across redeployments. Avoid using absolute paths because Dokploy may clean them during redeployment.

To download models, you can use LocalAI's model gallery API after deployment: curl https://ai.yourdomain.com/models/apply -H "Content-Type: application/json" -d '{"id":"huggingface@thebloke__mistral-7b-instruct-v0.2-gguf__mistral-7b-instruct-v0.2.Q4_K_M.gguf"}'. Alternatively, manually place GGUF model files in the ../files/localai-models directory on your server.

Domain & SSL Setup

  1. In your Dokploy project, navigate to the Domains tab
  2. Click Add Domain and enter your domain (e.g., ai.yourdomain.com)
  3. Set the container port to 8080
  4. Enable HTTPS — Dokploy automatically provisions a Let's Encrypt SSL certificate
  5. Save and wait for the certificate to be issued (usually under a minute)

Dokploy's built-in Traefik reverse proxy handles TLS termination and routes traffic to your LocalAI container. HTTPS is important if you plan to access the API over the internet.

Verifying the Deployment

  1. In Dokploy, go to your project's Deployments tab and click Deploy
  2. Watch the build logs — the localai container should start (initial startup may take 1–2 minutes)
  3. Check the Logs tab for the localai service. Look for: LocalAI API is ready
  4. Test the API: curl https://ai.yourdomain.com/v1/models — this lists available models
  5. After loading a model, test chat completion: curl https://ai.yourdomain.com/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"your-model-name","messages":[{"role":"user","content":"Hello"}]}'

Troubleshooting

Container takes a long time to start LocalAI loads models into memory on startup. Large models (7B+ parameters) can take several minutes to load, especially on CPU. The health check has a 120-second start period to account for this. Check logs for model loading progress.

Out of memory errors Each loaded model consumes RAM proportional to its size. A 7B parameter Q4 quantized model uses roughly 4–6 GB RAM. Ensure your server has enough memory for all loaded models plus system overhead. Reduce the number of simultaneously loaded models if needed.

Inference is very slow CPU inference is inherently slower than GPU inference. For better performance: increase THREADS to match your CPU core count, use smaller quantized models (Q4_K_M instead of Q8), reduce CONTEXT_SIZE, or switch to a GPU-enabled image with an NVIDIA GPU.

SSL certificate not issuing Ensure your domain's DNS A record points to your server's IP and has propagated. Dokploy uses Let's Encrypt HTTP-01 challenges, so port 80 must be accessible. Check Traefik logs in Dokploy for certificate-related errors.


Learn more about LocalAI in our complete overview.

Need a VPS? Hostinger VPS starts at $4.99/mo — perfect for running LocalAI.


For more on Docker Compose deployments in Dokploy, see the Dokploy Docker Compose documentation.

App data sourced from selfh.st open-source directory.

~/self-hosted-app/localai/dokploy/get-started

Ready to get started?

Get the best VPS hosting deal today. Hostinger offers 4GB RAM VPS starting at just $4.99/mo.

Get Hostinger VPS — $4.99/mo

// up to 75% off + free domain included

// related topics

localai dokploy docker compose self-hosted ai inference localai deployment

fordnox

Expert VPS reviews and hosting guides. We test every provider we recommend.

// last updated: February 13, 2026. Disclosure: This article may contain affiliate links.