The Wake-Up Call
OpenClaw is open source. OpenClaw is free. OpenClaw will also happily burn through your API budget faster than a teenager with a credit card at a mall.
We hit 92% of our weekly API budget with two days still on the clock. Two days of rationing an AI agent that's supposed to be always-on. That's when we started looking at where the money was actually going.
Turns out, we were running Claude Opus 4.6 — a frontier reasoning model that ranks among the best in the world — for heartbeat checks. Every 60 minutes, our agent would wake up, load its entire personality, look around, say "nothing to do," and go back to sleep. On one of the most expensive models available.
That's like hiring a PhD mathematician to count your grocery items. Sixteen times a day.
This is the story of how we figured out where the tokens go, and how we cut our costs by 55% without losing the intelligence that matters.
How Your Agent Burns Tokens
Here's something most AI agent guides gloss over. Every time your agent responds — every message, every heartbeat, every background task — this is what gets sent to the model:
Every API call:
┌─────────────────────────────────────────┐
│ System Prompt (25,000 tokens) │ ← Fixed overhead, same every time
│ ┌───────────────────────────────────┐ │
│ │ AGENTS.md, SOUL.md, IDENTITY.md │ │
│ │ MEMORY.md, TOOLS.md, HEARTBEAT.md│ │
│ │ Tool schemas, skills, runtime │ │
│ └───────────────────────────────────┘ │
├─────────────────────────────────────────┤
│ Conversation History │ ← Grows over time
│ [user message, assistant reply, ...] │
├─────────────────────────────────────────┤
│ Current Message │ ← This turn's input
│ "Hey, what's on my calendar today?" │
└─────────────────────────────────────────┘
The system prompt doesn't accumulate — it's the same 25,000 tokens sent on every call, like a fixed tax. Your conversation history is what grows over time. But that fixed tax adds up.
Heartbeat math: 16 checks/day × 25,000 tokens = 400,000 tokens/day just in system prompt overhead. That's 12 million tokens per month before your agent says a single useful thing.
On Opus ($5 per million input tokens), that's $60/month to keep the lights on. On a budget model like Grok 4.1 Fast ($0.20 per million), it's $2.40.
The good news: prompt caching exists. Once the model sees your system prompt the first time, subsequent calls within the cache window (up to 1 hour) treat it as a cache read — 75-90% cheaper than full input pricing. Set your heartbeat interval just under the cache TTL (e.g., 55 minutes for a 1-hour cache) and the overhead becomes almost negligible.
But caching only helps if you're not running the most expensive model for tasks that don't need it. That's where model tiering comes in.
The Three Models That Matter
The February 2026 LLM Arena rankings tell an interesting story: the #7 model globally costs 27 times less than Opus per token. The gap between "best" and "excellent" has never been smaller. The gap in price has never been larger.
For running AI agents in OpenClaw, three models cover every use case:
| Model | Input / Output (per 1M tokens) | Arena Rank | Context | Why It Matters |
|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 / $25.00 | #8 (1462 Elo) | 200K | The brain. Complex reasoning, personality, nuanced decisions. |
| Grok 4.1 Fast | $0.20 / $0.50 | #7 (1463 Elo) | 2M | The workhorse. Nearly the same rank, 27x cheaper. Budget scaling. |
| Gemini 3 Pro | $2.00 / $12.00 | #1 (1492 Elo) | 1M | The heavyweight. Best model globally. Free if you have Google Cloud credits. |
That's it. Opus when you need depth. Grok when you need efficiency. Gemini when you have credits to burn.
Everything else — Sonnet, Haiku, Kimi K2.5, GPT-5 mini, OpenRouter free models — fills niches. But these three cover the 90% case for anyone running agents.
A note on subscriptions vs. pay-per-token: If you're on Claude Max ($200/month flat rate, 20x Pro usage), Opus is "free" within your quota. The optimization game shifts to what you use for background tasks that eat into that quota. If you're on API keys, every token counts — and Grok at $0.20/$0.50 changes the economics completely.
The Playbook
An always-on agent has four types of work. Each has a different intelligence requirement:
Main Conversation → Opus This is where you're actually talking to your agent. Complex reasoning, creative writing, architecture decisions, code review. This is where frontier models earn their keep. Don't cheap out here.
Heartbeats → Grok Fast Wake up, check if anything needs attention, go back to sleep. This is a yes/no decision 90% of the time. With prompt caching, Grok Fast costs roughly $0.002 per heartbeat. That's about a dollar a month for 16 checks per day.
Sub-Agents → Codex or Grok Fast Background tasks like research, file processing, web scraping. Sub-agents in OpenClaw get a stripped-down context — just AGENTS.md and TOOLS.md, roughly 3-5K tokens. They don't need your agent's full personality. If you have a ChatGPT subscription, Codex is included at no extra cost. Otherwise, Grok Fast handles it.
Cron Jobs → Grok Fast Scheduled tasks: social media checks, monitoring, periodic reports. Predictable, repetitive, rarely complex. Use the cheapest model that executes reliably.
The pattern is simple: intelligence where it matters, efficiency everywhere else.
The Real Cost
Here's what a typical month looks like for an always-on OpenClaw agent — running Opus for everything versus tiered allocation:
| All Opus | Tiered (Opus + Grok + Codex) | |
|---|---|---|
| Main conversation | $68 | $68 (Opus) |
| Heartbeats (16/day) | $60 | $2.40 (Grok, cached) |
| Sub-agents (10 tasks/mo) | $15 | $0 (Codex, subscription) |
| Cron jobs (5/day) | $14 | $0.22 (Grok) |
| Monthly total | $157 | $70.62 |
| Savings | — | 55% |
Same main conversation quality. Same Opus intelligence for the work that matters. Half the bill.
If you push further — Grok for main conversation too — the total drops to roughly $19/month. You trade some personality and creative depth for an 88% cost reduction. For single-purpose agents (social media, monitoring, research), that's more than good enough.
Setting It Up
Multiple providers in one config
{
"auth": {
"profiles": {
"anthropic:default": { "provider": "anthropic", "mode": "token" },
"openrouter:default": { "provider": "openrouter", "mode": "api_key" }
}
},
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-opus-4-6",
"fallbacks": ["openrouter/x-ai/grok-4.1-fast"]
},
"models": {
"anthropic/claude-opus-4-6": { "alias": "opus" },
"openrouter/x-ai/grok-4.1-fast": { "alias": "grok-fast" }
},
"subagents": {
"model": { "primary": "openai-codex/gpt-5.1-codex-mini" }
}
}
}
}
Keep the cache warm
{
"agents": {
"defaults": {
"heartbeat": { "every": "55m" },
"models": {
"anthropic/claude-opus-4-6": {
"params": { "cacheRetention": "long" }
}
}
}
}
}
55-minute heartbeats with a 1-hour cache TTL means your system prompt stays cached. No expensive re-writes.
Switch models on the fly
Type /model grok-fast in chat to drop to Grok for budget work. /model opus to switch back. No restart needed.
Honest Tradeoffs
Model tiering isn't free lunch. Here's what you're actually trading:
Opus → Grok Fast: Grok is direct and capable, but it doesn't have Opus's depth in creative writing, nuanced conversation, or multi-step reasoning. For task execution and research, you won't notice. For personality-driven interactions, you will.
Any model → Codex (sub-agents): Sub-agents don't get your agent's personality files. They're ephemeral task executors. The model matters less than the task specification — a well-scoped prompt on a cheap model beats a vague prompt on Opus.
Caching tradeoff: Cache hits save money, but cache misses (if your heartbeat interval drifts past the TTL) cost more than uncached input. Keep the timing tight.
The key insight: most of an AI agent's daily token spend goes to tasks that don't need frontier reasoning. Heartbeats, monitoring, social media checks, background research — all of that is paying PhD prices for janitor work. Fix the allocation and you keep the intelligence where it counts.
Stop feeding your agent filet mignon for breakfast. Save the steak for dinner.
Rodrigo Ortega is a technologist building AI agent infrastructure. Research and optimization assistance from Kali, an AI agent running on OpenClaw.
Research date: February 2026 | Sources: Anthropic API Pricing, xAI Grok Pricing, LLM Arena Rankings, OpenClaw Documentation v2026.2.6
