Getting Started with the Claude Fable 5 API

Claude Fable 5 went GA on June 9, 2026, and it is the most capable model Anthropic has ever made generally available - a Mythos-class model made safe for general use. The model ID is claude-fable-5, it ships with a 1M-token context window (no long-context premium), 128K max output tokens, and a January 2026 knowledge cutoff. This guide gets you from zero to a working request, then covers the things that trip up everyone migrating existing code: Fable 5 removes several parameters that worked fine on Opus 4.8.

Install the SDK

Fable 5 is served through the same Messages API as every other Claude model. Use the official SDKs:

# Python
pip install anthropic

# TypeScript / JavaScript
npm install @anthropic-ai/sdk

Set ANTHROPIC_API_KEY in your environment and you're ready.

Your first request

Two things to notice before the code. First, adaptive thinking is always on in Fable 5 - there is no thinking parameter to set, and omitting it runs with thinking. Second, sampling parameters are gone. Do not pass thinking, temperature, top_p, or top_k. The one knob you tune is effort, inside output_config.

Python:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
 model="claude-fable-5",
 max_tokens=16000,
 output_config={"effort": "high"}, # the default; shown for clarity
 messages=[
 {"role": "user", "content": "Explain idempotency keys in payment APIs."}
 ],
)

for block in response.content:
 if block.type == "text":
 print(block.text)

TypeScript:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
 model: "claude-fable-5",
 max_tokens: 16000,
 output_config: { effort: "high" },
 messages: [
 { role: "user", content: "Explain idempotency keys in payment APIs." },
 ],
});

for (const block of response.content) {
 if (block.type === "text") console.log(block.text);
}

Note that max_tokens now caps thinking plus the visible response. Since thinking is always running, give it more headroom than you would have on a no-thinking Opus 4.8 call. Thinking text is omitted from responses by default (thinking.display defaults to "omitted"); if you surface reasoning to users, request display: "summarized".

What will 400 your old code

If you point existing Opus 4.8 code at claude-fable-5, these are the requests that fail:

ParameterBehavior on Fable 5
thinking: {"type": "disabled"}400 error - thinking cannot be turned off
thinking: {"type": "enabled", "budget_tokens": N}400 error - budget_tokens is removed; use effort
temperature, top_p, top_k (non-default)400 error - sampling parameters are removed
Assistant prefill (last message has role: "assistant")400 error - use structured outputs instead
Omitting thinking entirelyNo error - but the request runs with thinking (on Opus 4.8, omitting meant off)

The last row is the silent one: an Opus 4.8 codebase that never set thinking will not error on Fable 5, but its latency and token profile will change because every call now reasons. Budget for it.

Stream anything big

Fable 5 supports up to 128K output tokens, and at higher effort the model can spend a lot of them thinking. For any request above roughly 16K max_tokens, use streaming so you don't hit HTTP timeouts:

with client.messages.stream(
 model="claude-fable-5",
 max_tokens=64000,
 output_config={"effort": "xhigh"},
 messages=[{"role": "user", "content": "Refactor this module..."}],
) as stream:
 for text in stream.text_stream:
 print(text, end="", flush=True)
 final = stream.get_final_message()

Effort cheat-sheet

Effort controls how deeply Fable 5 thinks, how it consolidates tool calls, and how verbose it is. Anthropic's guidance: start at high (the default) - lower effort on Fable 5 often exceeds xhigh on prior models.

LevelUse for
lowLatency-sensitive routes, simple classification and extraction
mediumCost-sensitive workloads that still need real reasoning
highDefault. Most production work - start here
xhighCapability-sensitive coding and agentic loops
maxCorrectness-over-cost: research, hard migrations, one-shot answers that must be right

See our effort parameter guide for measured cost per level.

Handling refusals and fallbacks

Fable 5 ships with real-time safety classifiers (categories: cyber, bio, reasoning_extraction). A classifier-stopped response is not an HTTP error - it returns 200 with stop_reason: "refusal" and a stop_details.category:

response = client.messages.create(
 model="claude-fable-5",
 max_tokens=16000,
 fallbacks={"models": ["claude-opus-4-8"]}, # opt-in beta
 messages=[{"role": "user", "content": prompt}],
)

if response.stop_reason == "refusal":
 print(response.stop_details.category) # "cyber" | "bio" | "reasoning_extraction"
 print(response.stop_details.explanation)

The opt-in fallbacks parameter (beta) automatically retries a refused request on Opus 4.8; Anthropic says it triggers in under 5% of sessions, and the fallback credit refunds the cache-switch cost. The official SDKs for TypeScript, Python, Go, Java, and C# also ship refusal-fallback middleware if you'd rather handle it client-side.

Prompt caching

Caching works the same as on other Claude models, with one Fable-specific number: the minimum cacheable prefix is 512 tokens (1,024 on Amazon Bedrock). Shorter prefixes silently won't cache. Cache writes cost $12.50/MTok for the 5-minute TTL or $20/MTok for the 1-hour TTL; cache hits cost $1/MTok - a 90% discount on input.

system=[{
 "type": "text",
 "text": LARGE_SYSTEM_PROMPT, # must be 512+ tokens to cache
 "cache_control": {"type": "ephemeral"},
}]

Pricing snapshot

Fable 5 costs $10 per million input tokens and $50 per million output tokens - exactly 2x Opus 4.8 ($5/$25). The Batch API takes 50% off ($5/$25) for anything that can wait up to 24 hours. There is no long-context premium for the 1M window, and Fable 5 is a Covered Model: 30-day retention, no zero-data-retention option. Full breakdown in Fable 5 pricing explained.

That's everything you need for day one: swap in the model ID, delete the dead parameters, start at effort: "high", stream big outputs, and handle stop_reason: "refusal". The rest is tuning.

Related reading