- Claude Sonnet is the recommended default — best balance of capability and cost for agentic workloads as of early 2025
- Set ANTHROPIC_API_KEY in environment, configure model.provider and model.name in OpenClaw config, restart
- soul.md and identity.md are automatically injected as Claude's system prompt — no manual prompt engineering needed
- Extended thinking mode (model.thinking: true) improves complex multi-step task performance significantly
- Claude's 200k context window means most OpenClaw agents never hit context limits in normal usage
Across 12 production OpenClaw deployments I've set up over the past year, Claude consistently outperforms every other model on the tasks that matter most for agents: following multi-step instructions without drift, using tools in the right sequence, and adapting when an intermediate step produces unexpected results. The gap is larger than most people expect before they test it.
Why Claude Is the Top Choice for OpenClaw Agents
Agentic tasks are categorically harder than single-turn completions. Your agent needs to maintain context across many steps, use tools in the right order, handle unexpected results without losing track of the original goal, and produce output formatted for downstream consumption.
Here's what we've seen consistently when testing models on agentic workflows:
- Instruction-following consistency — Claude follows complex system prompt instructions across long conversations more reliably than GPT-4 or Gemini. By step 8 of a 10-step task, other models frequently drift from the original constraints.
- Tool use accuracy — Claude selects the right tool for the job with higher accuracy, especially when multiple tools are available and the distinction between them is subtle.
- Context window depth — Claude's 200k token context handles the largest real-world conversation histories without truncation artifacts that corrupt agent reasoning.
- Refusal calibration — Claude is appropriately cautious without being over-restrictive on legitimate business tasks — a balance other models struggle to maintain.
These aren't hypothetical advantages. They compound across multi-step tasks in ways that produce measurably better outcomes for users interacting with your agents.
Setting Up Your Anthropic API Key
Get your API key from console.anthropic.com. Navigate to API Keys, create a new key, and copy it immediately — it's only shown once.
Set it in your environment:
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
Add that line to ~/.bashrc or ~/.zshrc for persistence. For a server running OpenClaw as a systemd service, add it to your service's environment file instead:
# /etc/openclaw/environment
ANTHROPIC_API_KEY=sk-ant-your-key-here
Then reference it in the service unit:
EnvironmentFile=/etc/openclaw/environment
Choosing the Right Claude Model Tier
Claude comes in three tiers. The right choice depends on your use case:
| Model | Best For | Relative Cost |
|---|---|---|
| Claude Haiku | High-volume simple tasks, classification | Lowest |
| Claude Sonnet | General agent tasks (recommended default) | Medium |
| Claude Opus | Complex reasoning, highest accuracy tasks | Highest |
For most OpenClaw agents handling customer support, research, content creation, and automation tasks — Sonnet is the right default. You get Opus-level quality on 85% of tasks at a fraction of the cost. Haiku is worth testing for agents handling very high message volumes with simple, structured responses.
OpenClaw Configuration Reference
Configure Claude in your openclaw.yaml:
model:
provider: anthropic
name: claude-sonnet-4-5
maxTokens: 4096
temperature: 0.7
thinking: false # Set true for complex reasoning tasks
After updating config, restart OpenClaw:
openclaw restart
Verify the model is active:
openclaw models
This lists the currently configured model and confirms the API connection is working.
Tuning Claude for Agentic Performance
Claude's default behavior works well out of the box. These adjustments push it further for agent-specific workloads.
Temperature — For task-oriented agents (scheduling, research, data extraction), use 0.3–0.5. For creative or conversational agents, use 0.7–0.9. High temperature on task-oriented agents produces inconsistent output formats that break downstream processing.
Max tokens — Set this high enough for the longest response your agent needs to produce. For agents that generate reports or long-form content, 8,192 or higher. For conversational agents, 2,048 is usually sufficient.
Extended thinking — Set thinking: true for agents handling complex analytical tasks. This enables Claude's chain-of-thought reasoning mode, which dramatically improves performance on tasks requiring multi-step reasoning. It adds latency (typically 5–15 seconds extra per response) and cost, so use it selectively.
The soul.md file content automatically becomes Claude's system prompt. As of early 2025, Claude reads and respects system prompt instructions more consistently than any other major model. Well-written soul.md files translate directly into better agent behavior.
Common Mistakes When Using Claude With OpenClaw
Using an outdated model name. Claude model IDs change frequently. claude-3-opus-20240229 may still work but is not the latest. Use the current version IDs from the Anthropic documentation — running old models means missing capability improvements.
Setting temperature too high for task agents. A temperature of 1.0 on a data extraction agent produces creative but inconsistent output formats. Keep temperature at 0.3–0.5 for any agent that needs to return structured, predictable responses.
Not providing a soul.md. Without soul.md, Claude gets a minimal system prompt with no personality, constraints, or context. The agent works but produces generic responses that don't match your brand or use case. Even a 200-word soul.md makes a significant difference.
Ignoring rate limits on Anthropic's tier. Free and Tier 1 Anthropic accounts have low rate limits. If your agent handles concurrent conversations at volume, you'll hit 429 rate limit errors. Request a rate limit increase from Anthropic or implement request queuing in your OpenClaw config.
Frequently Asked Questions
Which Claude model works best with OpenClaw?
Claude Sonnet is the recommended default for OpenClaw agents as of early 2025. It balances capability and cost effectively for agentic tasks. Use Opus for complex reasoning-heavy workflows where quality matters more than cost, and Haiku for high-volume simple tasks where speed and cost are priorities.
How do I set my Anthropic API key in OpenClaw?
Set the environment variable ANTHROPIC_API_KEY with your key value. Then configure model.provider: anthropic and model.name: claude-sonnet-4-5 in your OpenClaw config. Restart OpenClaw after setting the key for the change to take effect.
Does OpenClaw support Claude's extended thinking mode?
Yes. Set model.thinking: true in your OpenClaw config to enable extended thinking. It significantly improves performance on complex multi-step tasks but increases cost and latency per response. Use it selectively for reasoning-heavy workflows.
Why is Claude better than GPT-4 for OpenClaw agents?
Claude follows complex multi-step instructions more reliably across long conversations, handles longer context windows effectively, and produces fewer inconsistencies on structured output tasks. For agentic workflows where the model must plan, execute tools, and adapt across many steps, Claude's consistency is the decisive advantage.
What does OpenClaw use for the Claude system prompt?
OpenClaw automatically injects your agent's soul.md and identity.md content as the Claude system prompt at agent initialization. Additional tool descriptions and channel context are also injected automatically by the OpenClaw runtime. You don't need to configure this manually.
How much does running OpenClaw agents on Claude cost?
Cost depends on conversation volume and length. A typical OpenClaw agent handling 100 conversations per day with 2,000-token average context costs roughly $1–3/day on Sonnet. Complex research agents with large context windows cost proportionally more. Use the Anthropic cost calculator for precise estimates based on your expected usage.
A. Larsen has deployed OpenClaw agents powered by Claude across customer service, research, and content platforms. Has run comparative benchmarks across Claude, GPT-4, and Gemini on real agentic workloads and published the results in the OpenClaw community forum.