Migrating to Claude Fable 5 from Opus 4.8 (or older)
Fable 5 keeps almost the same request surface as Opus 4.8 - Anthropic's official migration guide says to apply the Opus 4.7 breaking changes first, then swap the model ID. If you are already on Opus 4.8, those breaking changes are done: budget_tokens, sampling parameters, and assistant prefills already 400 there. Fable 5 adds exactly one new hard break (you can no longer disable thinking) and several behavioral shifts worth re-tuning. Here is the full checklist.
The checklist
- Swap the model ID to
claude-fable-5. On Bedrock and Vertex AI, use the provider-specific IDs in the table below. - Remove
thinking: {"type": "disabled"}. Adaptive thinking is always on and the only mode - an explicitdisabledreturns a 400. Omit thethinkingparameter entirely. Note the inverted default: on Opus 4.8, omittingthinkingmeant off; on Fable 5, omitting it runs with thinking. - Remove
budget_tokens. A 4.7-era leftover; anythinking: {"type": "enabled", "budget_tokens": N}returns a 400. Depth is controlled byeffortnow. - Remove sampling parameters. Non-default
temperature,top_p, ortop_kall return 400. Steer with prompting instead. - Remove assistant prefills. A trailing
role: "assistant"message returns 400. Use structured outputs (output_config.format) for forced JSON, or a system-prompt instruction for tone and preamble control. - Re-tune effort - downward. Start at
high(the default). Official guidance is that lower effort on Fable 5 often exceedsxhighon prior models, so an Opus 4.8 config pinned atxhighshould be re-evaluated, not copied. See our effort guide. - Set
thinking.displayif you surface reasoning. The default is"omitted"- thinking blocks stream with empty text. If your product shows reasoning to users, passthinking: {"display": "summarized"}or the UI will show a long silent pause before output. - Re-baseline
max_tokens. It caps thinking plus response, and thinking is now always included. Calls tuned for no-thinking Opus 4.8 will truncate; give headroom (>=64K with streaming forxhigh/maxroutes) and re-measure compaction triggers. - Adopt the
fallbacksparameter (beta) for refusal resilience. Fable 5's classifiers return HTTP 200 withstop_reason: "refusal"and astop_details.category(cyber,bio,reasoning_extraction). Opting intofallbacksauto-retries on Opus 4.8 - it triggers in under 5% of sessions, and the fallback credit refunds the cache-switch cost. SDK middleware exists for TypeScript, Python, Go, Java, and C#. - Managed Agents: name swap only. Per the official guide: "If you use Claude Managed Agents, no changes beyond updating the model name are required." Update the agent's
modelfield and you're done.
Before and after
A typical Opus 4.8 call site, migrated:
response = client.messages.create(
- model="claude-opus-4-8",
- max_tokens=8000,
- thinking={"type": "disabled"}, # 400 on Fable 5
- temperature=0.3, # 400 on Fable 5
+ model="claude-fable-5",
+ max_tokens=16000, # headroom: cap now includes thinking
+ output_config={"effort": "high"}, # start high, tune downward
+ fallbacks={"models": ["claude-opus-4-8"]}, # beta: refusal resilience
messages=[
{"role": "user", "content": prompt},
- {"role": "assistant", "content": "{"}, # prefill: 400 on Fable 5
],
+ # forced-JSON prefills become structured outputs:
+ # output_config={"effort": "high", "format": {"type": "json_schema", "schema": SCHEMA}},
)
And the refusal handling you should add wherever responses are consumed:
if response.stop_reason == "refusal":
log.warning("classifier stop: %s", response.stop_details.category)
# fall back, queue for review, or surface to the user
Model IDs by platform
Fable 5 launched GA on every major platform on June 9. The ID differs by provider:
| Platform | Model ID |
|---|---|
| Claude API / Claude Platform on AWS | claude-fable-5 |
| Amazon Bedrock | anthropic.claude-fable-5 |
| Google Vertex AI | claude-fable-5 |
Microsoft Foundry and GitHub Copilot expose Fable 5 through their own model pickers. One Bedrock-specific note: the minimum cacheable prompt prefix is 1,024 tokens there, versus 512 on the first-party API.
What doesn't change
Fable 5 uses the same tokenizer as Opus 4.8, so client-side token estimates carry over. Vision, tool use, the memory tool, compaction (beta), context editing (beta), the Batches API, structured outputs, and Managed Agents all work as before. Knowledge cutoff is January 2026. What you lose relative to older models: extended thinking with budgets, sampling parameters, prefills, and fast mode - none of which exist on Fable 5 at all.
Rollout advice
Migrate one route, run it for a day, and compare tokens per completed task - not per request - against Opus 4.8. Early adopters report Fable 5 finishing hard tasks in roughly half the tokens, which at 2x the price nets out near cost parity; but that only shows up on work where Opus 4.8 was struggling. Routes where Opus 4.8 already succeeds cheaply are often best left on Opus 4.8. The two models are designed to coexist - the fallbacks parameter literally wires them together.