Migrating to Claude Fable 5 from Opus 4.8 (or older)

Fable 5 keeps almost the same request surface as Opus 4.8 - Anthropic's official migration guide says to apply the Opus 4.7 breaking changes first, then swap the model ID. If you are already on Opus 4.8, those breaking changes are done: budget_tokens, sampling parameters, and assistant prefills already 400 there. Fable 5 adds exactly one new hard break (you can no longer disable thinking) and several behavioral shifts worth re-tuning. Here is the full checklist.

The checklist

  1. Swap the model ID to claude-fable-5. On Bedrock and Vertex AI, use the provider-specific IDs in the table below.
  2. Remove thinking: {"type": "disabled"}. Adaptive thinking is always on and the only mode - an explicit disabled returns a 400. Omit the thinking parameter entirely. Note the inverted default: on Opus 4.8, omitting thinking meant off; on Fable 5, omitting it runs with thinking.
  3. Remove budget_tokens. A 4.7-era leftover; any thinking: {"type": "enabled", "budget_tokens": N} returns a 400. Depth is controlled by effort now.
  4. Remove sampling parameters. Non-default temperature, top_p, or top_k all return 400. Steer with prompting instead.
  5. Remove assistant prefills. A trailing role: "assistant" message returns 400. Use structured outputs (output_config.format) for forced JSON, or a system-prompt instruction for tone and preamble control.
  6. Re-tune effort - downward. Start at high (the default). Official guidance is that lower effort on Fable 5 often exceeds xhigh on prior models, so an Opus 4.8 config pinned at xhigh should be re-evaluated, not copied. See our effort guide.
  7. Set thinking.display if you surface reasoning. The default is "omitted" - thinking blocks stream with empty text. If your product shows reasoning to users, pass thinking: {"display": "summarized"} or the UI will show a long silent pause before output.
  8. Re-baseline max_tokens. It caps thinking plus response, and thinking is now always included. Calls tuned for no-thinking Opus 4.8 will truncate; give headroom (>=64K with streaming for xhigh/max routes) and re-measure compaction triggers.
  9. Adopt the fallbacks parameter (beta) for refusal resilience. Fable 5's classifiers return HTTP 200 with stop_reason: "refusal" and a stop_details.category (cyber, bio, reasoning_extraction). Opting into fallbacks auto-retries on Opus 4.8 - it triggers in under 5% of sessions, and the fallback credit refunds the cache-switch cost. SDK middleware exists for TypeScript, Python, Go, Java, and C#.
  10. Managed Agents: name swap only. Per the official guide: "If you use Claude Managed Agents, no changes beyond updating the model name are required." Update the agent's model field and you're done.

Before and after

A typical Opus 4.8 call site, migrated:

 response = client.messages.create(
- model="claude-opus-4-8",
- max_tokens=8000,
- thinking={"type": "disabled"}, # 400 on Fable 5
- temperature=0.3, # 400 on Fable 5
+ model="claude-fable-5",
+ max_tokens=16000, # headroom: cap now includes thinking
+ output_config={"effort": "high"}, # start high, tune downward
+ fallbacks={"models": ["claude-opus-4-8"]}, # beta: refusal resilience
 messages=[
 {"role": "user", "content": prompt},
- {"role": "assistant", "content": "{"}, # prefill: 400 on Fable 5
 ],
+ # forced-JSON prefills become structured outputs:
+ # output_config={"effort": "high", "format": {"type": "json_schema", "schema": SCHEMA}},
 )

And the refusal handling you should add wherever responses are consumed:

if response.stop_reason == "refusal":
 log.warning("classifier stop: %s", response.stop_details.category)
 # fall back, queue for review, or surface to the user

Model IDs by platform

Fable 5 launched GA on every major platform on June 9. The ID differs by provider:

PlatformModel ID
Claude API / Claude Platform on AWSclaude-fable-5
Amazon Bedrockanthropic.claude-fable-5
Google Vertex AIclaude-fable-5

Microsoft Foundry and GitHub Copilot expose Fable 5 through their own model pickers. One Bedrock-specific note: the minimum cacheable prompt prefix is 1,024 tokens there, versus 512 on the first-party API.

What doesn't change

Fable 5 uses the same tokenizer as Opus 4.8, so client-side token estimates carry over. Vision, tool use, the memory tool, compaction (beta), context editing (beta), the Batches API, structured outputs, and Managed Agents all work as before. Knowledge cutoff is January 2026. What you lose relative to older models: extended thinking with budgets, sampling parameters, prefills, and fast mode - none of which exist on Fable 5 at all.

Rollout advice

Migrate one route, run it for a day, and compare tokens per completed task - not per request - against Opus 4.8. Early adopters report Fable 5 finishing hard tasks in roughly half the tokens, which at 2x the price nets out near cost parity; but that only shows up on work where Opus 4.8 was struggling. Routes where Opus 4.8 already succeeds cheaply are often best left on Opus 4.8. The two models are designed to coexist - the fallbacks parameter literally wires them together.

Related reading