Features

Adaptive thinking, always on - and the effort parameter that steers it

Claude Fable 5 is the first Claude model where adaptive thinking is always on. There is no thinking configuration to set, no token budget to tune, and no way to turn it off. The model decides - per request, per step - whether to reason and how deeply, and you steer the overall depth with a single knob: the effort parameter. It is the cleanest reasoning API Anthropic has shipped, and it makes a whole family of legacy parameters obsolete.

The only mode

On earlier models, thinking was something you opted into. Claude 4.6 deprecated fixed budget_tokens; Opus 4.7 and 4.8 removed them but still let you run with thinking off. Fable 5 closes the loop: adaptive thinking is the only mode. In practice that means:

  • No thinking config is needed. Omit the thinking parameter entirely - the model reasons adaptively by default.
  • thinking: {type: "disabled"} returns a 400. Unlike Opus 4.8, where an explicit disable is accepted, Fable 5 rejects it. There is no off switch.
  • budget_tokens returns a 400. Fixed thinking budgets are gone; depth is controlled by effort.
  • Sampling parameters are rejected. temperature, top_p, and top_k all return 400 - prompting is the steering mechanism.
  • Assistant prefills return a 400. Use structured outputs (output_config.format) or system-prompt instructions instead.

If you are migrating code that sets any of these, our migration guide walks through every breaking change file by file.

The effort parameter

Effort controls how much the model thinks and acts - reasoning depth, tool-call consolidation, preamble length, and verbosity all scale with it. Five levels are available, set inside output_config:

client.messages.create(
 model="claude-fable-5",
 max_tokens=16000,
 output_config={"effort": "high"}, # low | medium | high | xhigh | max
 messages=[{"role": "user", "content": "..."}],
)

The default is high, and Anthropic's guidance - echoed by early testers - is to start there rather than reflexively reaching for xhigh. Fable 5's intelligence ceiling is high enough that lower effort levels on Fable 5 often exceed xhigh on prior models. Sweep medium, high, and xhigh against your own evals and pick per route; reserve max for extremely hard, latency-insensitive problems. Counterintuitively, higher effort up front can reduce total cost on agentic work, because better planning means fewer turns.

What effort actually costs

Simon Willison published one of the first like-for-like comparisons, running the same prompt at every effort level on launch day. His datapoints anchor the range:

Effort levelOutput tokens (same prompt)Cost
low1,929$0.10
max14,430$0.72

That is roughly a 7x spread in tokens and cost between the floor and the ceiling for identical input - which is exactly the point. Effort is a real budget lever, not a cosmetic flag. For high-volume routes, dropping from high to medium or low on tasks that don't need deep reasoning is the single biggest cost optimization available on Fable 5. The full level-by-level analysis lives in our effort parameter guide.

Two details that catch people out

Thinking text is omitted by default. Thinking blocks still stream, but their text is empty unless you opt in with thinking: {"display": "summarized"}. If your product surfaces reasoning to users, the default looks like a long silent pause before output begins - set "summarized" to restore readable progress summaries. (This is the one legitimate use of the thinking parameter on Fable 5: configuring display, not toggling the mode.)

max_tokens caps thinking and response combined. The limit applies to the sum of reasoning tokens and visible output. A 16,000-token cap at max effort can be consumed mostly by thinking, truncating the answer mid-sentence. Give high-effort requests generous headroom - 64K with streaming is a sensible starting point for agentic work, against Fable 5's 128K output ceiling.

Quick reference: omit thinking entirely; set depth with output_config.effort; add thinking.display: "summarized" only if you show reasoning to users; size max_tokens for thinking + response together. Everything else - budgets, temperature, prefills - now returns a 400. See the official model documentation for the canonical parameter reference.

Why Anthropic made it mandatory

Removing the off switch looks opinionated, and it is. Anthropic's position is that the model is a better judge of when reasoning helps than the developer is at request-construction time - adaptive thinking on Opus 4.6 onward consistently outperformed manual budgets in internal evals, and disabled-thinking modes on 4.8 leaked reasoning into visible output anyway. Fable 5 simply finishes the consolidation: one mode, one knob. The result is fewer ways to misconfigure a request, and a model whose latency profile is genuinely workload-shaped - near-instant on trivial lookups, deliberate on hard problems - without the developer doing anything at all.

Related reading