- Ollama exposes an OpenAI-compatible API at
http://localhost:11434/v1— OpenClaw connects to it with zero additional tooling - Pull models with
ollama pull [model-tag]before starting OpenClaw — the gateway validates the connection on startup - Default Ollama context window is 2048 tokens — too small for agent system prompts; always set
num_ctx: 8192minimum - GPU acceleration is auto-detected on NVIDIA and Apple Silicon — verify it's active before benchmarking performance
- Mistral 7B Instruct v0.3 is the recommended starting model for OpenClaw agent pipelines with Ollama
Ollama has become the standard local model server for good reason: one command installs it, one command pulls a model, and the API it exposes is identical to OpenAI's. OpenClaw connects to that API without any adapter or plugin. The setup is straightforward — the errors people hit are almost always context window misconfigurations or model tag mismatches, both of which this guide covers explicitly.
Why Ollama Is the Best Local Server for OpenClaw
There are several local model servers available — LM Studio, llama.cpp direct, vLLM, text-generation-webui. Each has its place. Ollama wins for OpenClaw users specifically because of three things: automatic GPU detection, a clean OpenAI-compatible REST API, and a model library that makes pulling and managing models trivial.
LM Studio is excellent for GUI-driven model testing but adds friction for headless server deployments. llama.cpp direct gives you maximum control but requires manual compilation. Ollama sits in the middle: it's simple enough to set up in minutes but production-capable enough to run on a server without a GUI.
As of early 2025, Ollama supports NVIDIA CUDA, AMD ROCm, and Apple Metal acceleration — covering the GPU hardware most teams actually have. It runs as a background service, restarts on crash, and serves multiple model requests concurrently up to your hardware limits.
On macOS and Linux, Ollama installs as a background service that starts at login. On Windows, it runs in the system tray. This means your local model server is always available when OpenClaw starts — no manual launch step needed. Verify it's running with curl http://localhost:11434/ before proceeding.
Installing Ollama
Installation takes under two minutes on any supported platform.
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Verify installation
ollama --version
# Check the server is running
curl http://localhost:11434/
# Expected: {"status":"ok"}
On Windows, download the installer from ollama.com and run it. The application starts automatically and appears in the system tray. On Linux servers without a GUI, the install script handles the systemd service configuration automatically — Ollama starts on boot and runs without user interaction.
We'll get to model selection in a moment — but first you need to understand one critical point about the Ollama API endpoint format, because this is where 40% of connection failures originate.
The Ollama API base URL is http://localhost:11434/v1 — note the /v1 suffix. This is the OpenAI-compatible endpoint. The root URL http://localhost:11434/ is Ollama's native API and does not speak the OpenAI format. OpenClaw's config must point at the /v1 path.
Pulling the Right Model for OpenClaw
Models in Ollama are identified by name and tag. The tag determines the variant — quantization level, parameter count, and fine-tune type. For OpenClaw agents, you always want an instruct or chat variant, not a base model.
# Pull recommended models for OpenClaw agents
ollama pull mistral:7b-instruct-v0.3-q4_K_M
ollama pull llama3:8b-instruct
ollama pull llama3:70b-instruct # requires 64GB+ RAM or GPU
# List pulled models and their exact tags
ollama list
# Test a model responds correctly
ollama run mistral:7b-instruct-v0.3-q4_K_M "Reply with only: READY"
Copy the exact tag from ollama list output. The model name in your OpenClaw config must match this tag character-for-character. A mismatch causes a startup error that looks like a connection failure but is actually a 404.
Configuring OpenClaw to Use Ollama
With Ollama running and a model pulled, the OpenClaw config change is four lines.
# openclaw.config.yaml
model:
provider: openai-compatible
base_url: http://localhost:11434/v1
model: mistral:7b-instruct-v0.3-q4_K_M
api_key: ollama
context_window: 8192
temperature: 0.15
max_tokens: 2048
Or set via CLI:
openclaw config set model.provider=openai-compatible
openclaw config set model.base_url=http://localhost:11434/v1
openclaw config set model.model=mistral:7b-instruct-v0.3-q4_K_M
openclaw config set model.api_key=ollama
openclaw config set model.context_window=8192
Restart the OpenClaw gateway after making config changes. On startup, OpenClaw sends a test completion request to validate the connection. If it succeeds, you'll see Model provider connected: openai-compatible in the gateway logs.
OpenClaw validates the model connection at startup by sending a test completion request. If the model specified in config hasn't been pulled in Ollama yet, this validation fails and the gateway refuses to start. Always run ollama pull [model] before starting or restarting OpenClaw after a model change.
Fixing the Context Window — The Step Everyone Skips
Ollama's default context window is 2048 tokens. OpenClaw agent system prompts — which include the agent's instructions, tool definitions, and conversation history — regularly exceed this limit. When truncation happens, the agent loses its tool definitions and starts producing plain text instead of structured tool calls.
Fix this with a custom Modelfile that overrides the default parameters:
# Create a custom Modelfile for OpenClaw
# ~/.ollama/Modelfile.openclaw-mistral
FROM mistral:7b-instruct-v0.3-q4_K_M
PARAMETER num_ctx 8192
PARAMETER num_predict 2048
PARAMETER temperature 0.15
PARAMETER repeat_penalty 1.1
# Create the model variant
ollama create openclaw-mistral -f ~/.ollama/Modelfile.openclaw-mistral
# Update your OpenClaw config to use this variant
openclaw config set model.model=openclaw-mistral
This approach locks in the context window at the Ollama level, so OpenClaw doesn't need to pass parameters on every request. It also lets you have a dedicated model configuration for agent workloads without affecting other Ollama users on the same machine.
For comparison, here's what different context window sizes mean in practice for OpenClaw:
| num_ctx | Use Case | RAM Impact (7B model) |
|---|---|---|
| 2048 | Default — insufficient for agents | +0 MB |
| 4096 | Simple agents, minimal tools | +200 MB |
| 8192 | Recommended for most agents | +600 MB |
| 16384 | Document-heavy pipelines | +1.4 GB |
Common Mistakes With OpenClaw + Ollama
- Wrong base_url path — using
http://localhost:11434/instead ofhttp://localhost:11434/v1causes a 404 on every request. Always include the/v1suffix. - Model tag mismatch — the model name in OpenClaw config must exactly match the tag shown in
ollama list. Even a minor difference causes a model-not-found error at startup. - Not increasing num_ctx — the 2048 default silently truncates agent context. Tool definitions disappear. The agent starts producing broken responses. Set num_ctx to at least 8192.
- Running Ollama and the model simultaneously with other GPU workloads — VRAM contention causes inference failures mid-task. Dedicate the GPU to Ollama during agent workloads.
- Using a base model without instruct fine-tuning — base models don't follow the chat format. Always pull the
-instructor-chattagged variant.
Frequently Asked Questions
Does Ollama work with OpenClaw out of the box?
Yes. Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1, which OpenClaw accepts natively. Set provider to openai-compatible, point base_url at the Ollama endpoint, and set model to your pulled model name. No plugins or adapters are required.
What Ollama models work best with OpenClaw agents?
Mistral 7B Instruct, Llama 3 8B Instruct, and Llama 3 70B Instruct are the most reliable for agent pipelines. They handle multi-turn context and tool calling correctly. Pull them with ollama pull mistral:7b-instruct and ollama pull llama3:8b-instruct respectively.
Why does OpenClaw fail to connect to Ollama?
The most common causes are: Ollama isn't running when OpenClaw starts, the base_url has a typo, or the model name in config doesn't match the exact tag pulled in Ollama. Run ollama list to confirm the exact model tag and verify Ollama is serving on port 11434.
How do I increase Ollama's context window for OpenClaw?
Ollama defaults to a 2048-token context window, which is too small for most agent system prompts. Set num_ctx in a custom Modelfile or pass it via the API options field. A minimum of 8192 is recommended for OpenClaw agents with tool definitions.
Can Ollama use my GPU with OpenClaw?
Yes. Ollama auto-detects CUDA-capable NVIDIA GPUs and Apple Silicon Metal on macOS. Verify GPU usage by running ollama run [model] --verbose and checking the GPU layers loaded line. If GPU isn't detected, install CUDA drivers and restart Ollama.
Is there a way to run multiple models in Ollama for different OpenClaw agents?
Ollama serves one model at a time per instance by default. For multi-model setups, run separate Ollama instances on different ports, then configure each OpenClaw agent to point at the appropriate port. Alternatively, use a larger model capable of handling all agent types.
Ollama plus OpenClaw is one of the cleanest local agent setups available right now. The install is fast, the API is standard, and the model library covers every use case from lightweight summarization to serious reasoning. Fix the context window, match your model tags exactly, and you'll have a working local pipeline before lunch.
Pull mistral:7b-instruct-v0.3-q4_K_M, apply the Modelfile config above, and run openclaw gateway start. That's the complete path from zero to running local agents — no account needed, no cloud billing, no data leaving your machine.
A. Larsen specializes in connecting OpenClaw to local and self-hosted model infrastructure. Has deployed Ollama-backed OpenClaw systems for teams in healthcare and legal where data residency requirements make cloud APIs non-viable.