Claude Fable 5 vs Opus 4.8

Claude Fable 5 launched on June 9 at exactly twice the per-token price of Opus 4.8 - $10/$50 per million tokens against $5/$25. Anthropic calls it "a Mythos-class model made safe for general use," and the benchmark deltas are the largest between consecutive Anthropic flagships in years. The question for anyone running Opus 4.8 in production is not whether Fable 5 is better - it clearly is - but whether it is better per dollar for your workload. Here is the full comparison.

Specs and pricing, side by side

	Claude Fable 5	Claude Opus 4.8
Model ID	`claude-fable-5`	`claude-opus-4-8`
Context window	1M tokens (no long-context premium)	1M tokens
Max output	128K tokens	128K tokens
Knowledge cutoff	January 2026	Earlier cutoff; same tokenizer as Fable 5
Input / output price	$10 / $50 per MTok	$5 / $25 per MTok
Batch API (50% off)	$5 / $25 per MTok	$2.50 / $12.50 per MTok
Cache write (5-min / 1-hr)	$12.50 / $20 per MTok	$6.25 / $10 per MTok
Cache hit	$1 per MTok	$0.50 per MTok
Minimum cacheable prefix	512 tokens (1,024 on Bedrock)	Larger minimum
Thinking modes	Adaptive, always on - cannot be disabled (`disabled` -> 400)	Adaptive (opt-in) or off; off by default
Effort levels	low - medium - high (default) - xhigh - max	low - medium - high (default) - xhigh - max
Sampling params / prefills	Removed (400)	Removed (400)
Fast mode	Not available	Not available (Opus 4.6 Fast remains)
Safety system	Real-time classifiers + opt-in fallback to Opus 4.8	Standard safeguards

The API surfaces are nearly identical - Fable 5's only new breaking change over Opus 4.8 is that thinking can no longer be turned off. See the migration guide for the full checklist.

Benchmarks

Launch-day numbers, as published by Anthropic:

Benchmark	Fable 5	Opus 4.8
SWE-Bench Pro	80.3%	69.2%	+11.1
FrontierCode Diamond	29.3%	13.4%	+15.9
Terminal-Bench 2.1	88.0%	82.7%	+5.3
GDPval-AA (Elo)	1932	1890	+42
Humanity's Last Exam (tools)	64.5%	57.9%	+6.6
OSWorld-Verified	85.0%	83.4%	+1.6
Legal Agent	13.3%	10.4%	+2.9
HealthBench Professional	66.0%	56.9%	+9.1

The pattern is consistent: the gaps are widest exactly where the work is hardest. FrontierCode Diamond - problems chosen to be at the edge of what any model can do - more than doubles. SWE-Bench Pro jumps eleven points. On easier, more saturated evaluations like OSWorld, the gap narrows. Full analysis in our benchmarks breakdown.

The cost math: 2x price = 2x cost

Per-token, Fable 5 costs exactly double. Per-task, the early evidence is more interesting. A Canva engineer on Hacker News reported Fable 5 getting "better results with about half the tokens" on their internal workloads - and at half the tokens, 2x the per-token price works out to roughly effective cost parity with Opus 4.8. The mechanism is plausible: Fable 5 consolidates tool calls more aggressively, takes fewer wrong turns in agentic loops, and needs fewer retry rounds, so a completed task consumes fewer total tokens even though each token costs more.

The counterpoint comes from Simon Willison, who spent $110.42 in a single day of heavy experimentation and criticized the price alongside latency and guardrail false positives. His effort-level test on a single prompt ran from 1,929 tokens ($0.10) at low to 14,430 tokens ($0.72) at max - a 7x spread on the same input. The lesson: Fable 5's cost is dominated by how you configure it, and an untuned deployment at high effort across all traffic will absolutely cost more than Opus 4.8.

Both stories are true at once. Token efficiency gains show up on hard, multi-step tasks where Opus 4.8 would have flailed; they don't show up on simple completions, where you're just paying double for headroom you don't use.

When to choose each

Choose Opus 4.8 for cost-sensitive, high-volume workloads: summarization pipelines, classification at scale, chat products with tight unit economics, and anything where Opus 4.8 already succeeds reliably. It remains an excellent frontier model at half the price, and it is the designated fallback target when Fable 5's classifiers refuse.

Choose Fable 5 for the hardest long-horizon work: multi-hour agentic coding runs, large migrations (Stripe migrated a 50-million-line Ruby codebase in a day), deep research, legal and healthcare analysis, and any task where Opus 4.8's failure rate forces human intervention. When a failed task costs engineer time, the model that finishes the job wins regardless of token price. Replit's take - it "one-shots... understands what builders mean" - and Hebbia's 10-point jump past 90% on their core analytics benchmark are both this category.

A practical pattern: route by difficulty. Keep Opus 4.8 (or Sonnet 4.6 at $3/$15) on the high-volume easy path, and send the work that actually needs a frontier model to Fable 5 at effort: "high". Measure tokens per completed task, not tokens per request - that is the number the Canva report is really about.

Specs and pricing, side by side

Benchmarks

The cost math: 2x price = 2x cost

When to choose each

Related reading