The Fable 5 Effort Parameter: A Practical Guide
Claude Fable 5 removed every dial developers used to control model behavior on older Claude models - no temperature, no top_p, no thinking budgets, no way to turn thinking off. In their place is a single parameter: effort, set inside output_config. If you ship anything on claude-fable-5, effort tuning is the highest-leverage configuration work you can do. Here is how it actually behaves.
What effort controls
Effort is not just a thinking-token budget. It moves three things at once:
- Thinking depth. Fable 5's adaptive thinking is always on; effort sets how much reasoning the model does per step before acting.
- Tool-call consolidation. At lower effort the model batches and consolidates tool calls; at higher effort it explores more - extra searches, extra file reads, extra verification passes.
- Verbosity. Lower effort produces terser confirmations and less preamble; higher effort narrates and double-checks more.
Because all three move together, effort changes the shape of an agentic trace, not just its length. Treat it as a per-route setting, not a global constant.
The five levels
| Level | Use cases |
|---|---|
low | Latency-sensitive routes and simple tasks: classification, extraction, short lookups, subagents doing scoped work |
medium | Cost-sensitive workloads that still need real reasoning; high-volume pipelines where you've verified quality holds |
high | The default. Most production work - coding, analysis, multi-step tasks. Start here |
xhigh | The most capability-sensitive coding and agentic work: long autonomous runs, hard refactors, frontier problems |
max | Correctness over cost: one-shot answers that must be right, research, verification passes, high-stakes output |
Anthropic's official guidance is to start at high and tune from there - and notably, that lower effort on Fable 5 often exceeds xhigh on prior models. If your Opus 4.8 deployment ran at xhigh, do not carry that setting over reflexively; Fable 5 at high (or even medium) may match or beat it while spending fewer tokens. Re-run your evals rather than copying the config.
What each level costs
Simon Willison published the cleanest public datapoint so far: the same prompt run at every effort level ranged from 1,929 tokens ($0.10) at low to 14,430 tokens ($0.72) at max - a roughly 7x spread in both tokens and dollars on identical input. The intermediate levels scale between those endpoints. Two takeaways:
- The gap between adjacent levels is large enough that picking the right level per route matters more than almost any prompt optimization.
- At $50/MTok output, uncapped
max-effort traffic adds up fast - Willison spent $110.42 in a day of experimentation.
Setting effort in code
Effort lives in output_config, not at the top level. Python:
import anthropic
client = anthropic.Anthropic()
# Fast, cheap classification route
quick = client.messages.create(
model="claude-fable-5",
max_tokens=4000,
output_config={"effort": "low"},
messages=[{"role": "user", "content": "Classify this ticket: ..."}],
)
# Capability-sensitive agentic route - stream it
with client.messages.stream(
model="claude-fable-5",
max_tokens=64000,
output_config={"effort": "xhigh"},
messages=[{"role": "user", "content": "Migrate this module to the new API..."}],
) as stream:
final = stream.get_final_message()
TypeScript:
const response = await client.messages.create({
model: "claude-fable-5",
max_tokens: 16000,
output_config: { effort: "high" }, // omitting effort also means "high"
messages: [{ role: "user", content: "Review this diff for bugs..." }],
});
Effort and max_tokens: give it headroom
On Fable 5, max_tokens caps thinking plus the visible response - and thinking is always on. At xhigh or max, a tight max_tokens silently strangles the reasoning the higher effort level was supposed to buy you: the model burns its budget thinking and truncates mid-answer with stop_reason: "max_tokens".
xhigh or max, set max_tokens to 64,000 or higher and use streaming (required above ~16K anyway to avoid HTTP timeouts). Fable 5 supports up to 128K output, so there is room.Task budgets for agentic loops
Effort controls per-response depth. For cumulative spend across a multi-turn agentic loop, Fable 5 supports task budgets (beta, header task-budgets-2026-03-13): you declare a total token budget for the whole task, the model sees a running countdown, and it prioritizes and wraps up gracefully as the budget drains.
response = client.beta.messages.create(
betas=["task-budgets-2026-03-13"],
model="claude-fable-5",
max_tokens=64000,
output_config={
"effort": "high",
"task_budget": {"type": "tokens", "total": 200000},
},
messages=[...],
)
Unlike max_tokens (a hard per-response ceiling the model never sees), a task budget is a suggestion the model is aware of and works within. The combination - effort for depth, task_budget for total spend - is the closest thing Fable 5 has to a cost-control system, and it is how you let a long run go deep without writing a blank check.
A tuning workflow that works
- Ship everything at
high(the default) and collect per-route token metrics. - Drop simple, high-volume routes to
mediumorlow; verify quality on your evals before committing. - Promote only the routes where
highmeasurably falls short toxhigh- usually long agentic coding runs. - Reserve
maxfor cases where a wrong answer costs more than $0.72.
Measure cost per completed task, not per request - higher effort up front frequently reduces retries and total spend on hard work.