The Fable 5 Effort Parameter: A Practical Guide

Claude Fable 5 removed every dial developers used to control model behavior on older Claude models - no temperature, no top_p, no thinking budgets, no way to turn thinking off. In their place is a single parameter: effort, set inside output_config. If you ship anything on claude-fable-5, effort tuning is the highest-leverage configuration work you can do. Here is how it actually behaves.

What effort controls

Effort is not just a thinking-token budget. It moves three things at once:

  • Thinking depth. Fable 5's adaptive thinking is always on; effort sets how much reasoning the model does per step before acting.
  • Tool-call consolidation. At lower effort the model batches and consolidates tool calls; at higher effort it explores more - extra searches, extra file reads, extra verification passes.
  • Verbosity. Lower effort produces terser confirmations and less preamble; higher effort narrates and double-checks more.

Because all three move together, effort changes the shape of an agentic trace, not just its length. Treat it as a per-route setting, not a global constant.

The five levels

LevelUse cases
lowLatency-sensitive routes and simple tasks: classification, extraction, short lookups, subagents doing scoped work
mediumCost-sensitive workloads that still need real reasoning; high-volume pipelines where you've verified quality holds
highThe default. Most production work - coding, analysis, multi-step tasks. Start here
xhighThe most capability-sensitive coding and agentic work: long autonomous runs, hard refactors, frontier problems
maxCorrectness over cost: one-shot answers that must be right, research, verification passes, high-stakes output

Anthropic's official guidance is to start at high and tune from there - and notably, that lower effort on Fable 5 often exceeds xhigh on prior models. If your Opus 4.8 deployment ran at xhigh, do not carry that setting over reflexively; Fable 5 at high (or even medium) may match or beat it while spending fewer tokens. Re-run your evals rather than copying the config.

What each level costs

Simon Willison published the cleanest public datapoint so far: the same prompt run at every effort level ranged from 1,929 tokens ($0.10) at low to 14,430 tokens ($0.72) at max - a roughly 7x spread in both tokens and dollars on identical input. The intermediate levels scale between those endpoints. Two takeaways:

  • The gap between adjacent levels is large enough that picking the right level per route matters more than almost any prompt optimization.
  • At $50/MTok output, uncapped max-effort traffic adds up fast - Willison spent $110.42 in a day of experimentation.

Setting effort in code

Effort lives in output_config, not at the top level. Python:

import anthropic

client = anthropic.Anthropic()

# Fast, cheap classification route
quick = client.messages.create(
 model="claude-fable-5",
 max_tokens=4000,
 output_config={"effort": "low"},
 messages=[{"role": "user", "content": "Classify this ticket: ..."}],
)

# Capability-sensitive agentic route - stream it
with client.messages.stream(
 model="claude-fable-5",
 max_tokens=64000,
 output_config={"effort": "xhigh"},
 messages=[{"role": "user", "content": "Migrate this module to the new API..."}],
) as stream:
 final = stream.get_final_message()

TypeScript:

const response = await client.messages.create({
 model: "claude-fable-5",
 max_tokens: 16000,
 output_config: { effort: "high" }, // omitting effort also means "high"
 messages: [{ role: "user", content: "Review this diff for bugs..." }],
});

Effort and max_tokens: give it headroom

On Fable 5, max_tokens caps thinking plus the visible response - and thinking is always on. At xhigh or max, a tight max_tokens silently strangles the reasoning the higher effort level was supposed to buy you: the model burns its budget thinking and truncates mid-answer with stop_reason: "max_tokens".

Rule of thumb: at xhigh or max, set max_tokens to 64,000 or higher and use streaming (required above ~16K anyway to avoid HTTP timeouts). Fable 5 supports up to 128K output, so there is room.

Task budgets for agentic loops

Effort controls per-response depth. For cumulative spend across a multi-turn agentic loop, Fable 5 supports task budgets (beta, header task-budgets-2026-03-13): you declare a total token budget for the whole task, the model sees a running countdown, and it prioritizes and wraps up gracefully as the budget drains.

response = client.beta.messages.create(
 betas=["task-budgets-2026-03-13"],
 model="claude-fable-5",
 max_tokens=64000,
 output_config={
 "effort": "high",
 "task_budget": {"type": "tokens", "total": 200000},
 },
 messages=[...],
)

Unlike max_tokens (a hard per-response ceiling the model never sees), a task budget is a suggestion the model is aware of and works within. The combination - effort for depth, task_budget for total spend - is the closest thing Fable 5 has to a cost-control system, and it is how you let a long run go deep without writing a blank check.

A tuning workflow that works

  1. Ship everything at high (the default) and collect per-route token metrics.
  2. Drop simple, high-volume routes to medium or low; verify quality on your evals before committing.
  3. Promote only the routes where high measurably falls short to xhigh - usually long agentic coding runs.
  4. Reserve max for cases where a wrong answer costs more than $0.72.

Measure cost per completed task, not per request - higher effort up front frequently reduces retries and total spend on hard work.

Related reading