MINIMAX_API_KEY in your environment before editing openclaw.yaml — the config reads env vars at startup, not at runtime.provider: minimax in openclaw.yaml — not "minimax-ai" or "minimax-v2", which are invalid strings that produce cryptic errors.Three agents, one provider, zero audio support — that was the situation before we added MiniMax to our OpenClaw stack. Within 20 minutes of configuring the minimax provider block, one agent was transcribing customer voice messages and routing them to the right workflow. MiniMax is not a replacement for Claude or GPT-4o. It fills a specific gap: multimodal inputs, especially audio, at a cost per token that makes large-scale processing practical.
Why MiniMax Deserves a Spot in Your Provider Stack
Most builders treat provider selection as binary — you pick one model and run everything through it. That is the mistake that makes agents expensive to operate and brittle to maintain. MiniMax solves a concrete set of problems that other providers handle poorly or not at all.
Audio input is the obvious one. As of early 2025, Claude and GPT-4o do not accept raw audio through their standard API endpoints — you need a transcription layer in front of them. MiniMax takes audio directly. Your agent can receive a voice note, process it, and respond without an intermediate service. That removes latency and a billing layer.
The second use case is Chinese-language content. MiniMax was built by a Chinese AI lab and its models perform significantly better on CJK-language tasks than Western-focused alternatives. If any part of your agent workflow touches Chinese, Japanese, or Korean content, this matters.
Getting MiniMax API Access
API access starts at api.minimax.chat. Create an account, verify your email, and navigate to the API Keys section in your dashboard. Generate a new key — copy it immediately because MiniMax shows the full key only once.
Set the environment variable before you touch your OpenClaw config:
# Linux / macOS
export MINIMAX_API_KEY="your-key-here"
# Windows PowerShell
$env:MINIMAX_API_KEY="your-key-here"
# Persist in .env (recommended for projects)
MINIMAX_API_KEY=your-key-here
OpenClaw reads environment variables at startup. If you set the variable after starting OpenClaw, you need to restart the process — it will not pick up changes dynamically.
OpenClaw YAML Configuration
The provider block in your openclaw.yaml file is the only place OpenClaw needs to know about MiniMax. Here is the complete config for both available models:
providers:
minimax:
api_key: "${MINIMAX_API_KEY}"
default_model: abab6.5
models:
- id: abab6.5
context_window: 245760
supports_vision: true
supports_audio: true
- id: abab6.5s
context_window: 245760
supports_vision: true
supports_audio: false
agents:
audio-router:
provider: minimax
model: abab6.5
description: "Processes voice input and routes to appropriate workflow"
content-classifier:
provider: minimax
model: abab6.5s
description: "Classifies text and image content by category"
The provider: minimax string is exact — not minimax-ai, not minimaxai, not minimax-v2. OpenClaw's provider registry matches on exact strings and returns a generic "provider not found" error for anything else. We've seen builders spend an hour debugging a typo here.
Verifying the Connection
Run openclaw doctor --provider minimax after configuration. A passing check shows the model list and confirms your API key is valid. If you see "authentication failed," the key is either wrong or the environment variable is not set in the current shell session.
Multimodal Input Support
This is where MiniMax genuinely separates itself. Here is what each model handles and how to pass each input type to your agent.
| Input Type | abab6.5 | abab6.5s | Notes |
|---|---|---|---|
| Text | ✓ | ✓ | Up to 245k context |
| Image (URL/base64) | ✓ | ✓ | JPEG, PNG, WebP |
| Audio (raw/URL) | ✓ | ✗ | MP3, WAV, M4A |
| Video | ✗ | ✗ | Not yet supported |
Passing audio to an agent uses the input_mode field in your skill configuration. The agent spec looks like this:
skills:
process-voice-message:
input_mode: audio
model: abab6.5
prompt: |
Transcribe the audio input accurately. Identify the primary intent.
Classify into one of: [support-request, billing-query, product-feedback, other].
Return structured JSON with fields: transcript, intent, confidence.
Rate Limits and Scaling
MiniMax's standard tier enforces 60 requests per minute and 1 million tokens per day. Those numbers sound generous until you have three agents running concurrently. Sixty RPM across three agents means each agent gets 20 requests per minute — that's one request every 3 seconds, which constrains throughput on high-volume workflows.
The fix is not begging for a rate limit increase (though you can request one). The fix is exponential backoff in your agent's retry config:
providers:
minimax:
api_key: "${MINIMAX_API_KEY}"
retry:
max_attempts: 4
initial_delay_ms: 500
backoff_multiplier: 2
retry_on: [429, 503]
This tells OpenClaw to retry 429 rate-limit errors with increasing delays: 500ms, 1s, 2s, 4s. Most transient rate limit spikes resolve within this window.
MiniMax vs Other Providers for Specific Tasks
Use this as a routing framework, not a ranking. No single provider wins everything.
| Task Type | Best Provider | Why |
|---|---|---|
| Audio transcription | MiniMax abab6.5 | Native audio input, no pre-processing |
| CJK language tasks | MiniMax abab6.5 | Training data advantage in CJK |
| Complex English reasoning | Claude Sonnet | Stronger multi-step reasoning chain |
| High-speed text generation | abab6.5s / Groq | Lower latency at cost of depth |
| Code generation | Codex / GPT-4o | Specialized training on code corpora |
Common Mistakes That Break MiniMax Integration
We have seen the same four errors across multiple integration setups. Here they are so you can skip past them.
Mistake 1: Using abab6.5s for audio tasks. The model string accepts audio input in the config without throwing an error — but the response comes back empty. Always use abab6.5 when your agent workflow includes audio.
Mistake 2: Setting the provider string incorrectly. OpenClaw's registry matches on exact provider identifiers. "minimax-ai," "MiniMax," and "minimax_v2" all fail silently with a provider-not-found error. The correct string is minimax, lowercase.
Sound familiar? This one has burned experienced builders who assumed the registry was case-insensitive.
Mistake 3: Not handling 429 errors. Without retry logic, a rate limit hit during a long-running agent task causes the entire task to fail. Add the retry block to your provider config from day one.
Mistake 4: Starting OpenClaw before exporting the env var. The process reads environment variables once at startup. If you set MINIMAX_API_KEY after starting OpenClaw, your agents authenticate against an empty string and return 401 errors on every request.
Frequently Asked Questions
What is MiniMax in OpenClaw?
MiniMax is a multimodal AI provider you connect to OpenClaw via the minimax provider block in openclaw.yaml. It gives your agents access to the abab6.5 and abab6.5s models, which handle text, image, and audio inputs natively through a single API endpoint — no extra transcription service required.
How do I get a MiniMax API key?
Sign up at api.minimax.chat, navigate to API Keys in your account dashboard, and generate a new key. Copy it immediately — MiniMax only shows the full key once. Store it as MINIMAX_API_KEY in your environment before configuring OpenClaw.
Which MiniMax model should I use with OpenClaw?
Use abab6.5 for production tasks requiring high accuracy — it handles complex reasoning and multimodal inputs reliably. Use abab6.5s when speed matters more than depth. As of early 2025, abab6.5 performs closest to GPT-4o-level reasoning on structured tasks.
Does MiniMax support image and audio input in OpenClaw agents?
MiniMax natively supports text, image, and audio inputs through its API. OpenClaw passes multimodal content by setting the input_mode field in your skill config. Audio support requires abab6.5 — the abab6.5s model only handles text and images as of early 2025.
What are MiniMax rate limits for OpenClaw agents?
MiniMax's standard tier allows 60 requests per minute and 1 million tokens per day. Heavy concurrent agent tasks will hit this ceiling. Implement exponential backoff in your agent's retry config and consider request batching for high-throughput workflows to stay within limits.
How does MiniMax compare to Claude or GPT-4o for OpenClaw agents?
MiniMax excels at audio processing and Chinese-language tasks — areas where Claude and GPT-4o lag behind. For English reasoning and tool-use, Claude Sonnet still outperforms. Use MiniMax when your agents need native audio support or strong CJK language handling rather than as a general replacement.
T. Chen has integrated over a dozen AI providers into production OpenClaw deployments, with a focus on multimodal pipeline architecture. He has benchmarked MiniMax's audio models against Whisper-based stacks and documented the latency and cost differences firsthand. Based in Singapore, he primarily works on enterprise agent systems handling CJK and English content concurrently.
You now know how to get MiniMax connected, configured, and scaled inside OpenClaw. You have the exact YAML blocks, the model differences that matter, and the four mistakes that waste hours.
Add MiniMax to your provider stack and your agents gain native audio and strong CJK support in under 20 minutes.
Free to set up. No additional OpenClaw license required. Takes under 5 minutes once your API key is in hand.
→ Next: Set up OpenRouter for multi-model fallback routing