Claude Fable 5 vs Opus 4.8

Claude Fable 5 launched on June 9 at exactly twice the per-token price of Opus 4.8 - $10/$50 per million tokens against $5/$25. Anthropic calls it "a Mythos-class model made safe for general use," and the benchmark deltas are the largest between consecutive Anthropic flagships in years. The question for anyone running Opus 4.8 in production is not whether Fable 5 is better - it clearly is - but whether it is better per dollar for your workload. Here is the full comparison.

Specs and pricing, side by side

Claude Fable 5Claude Opus 4.8
Model IDclaude-fable-5claude-opus-4-8
Context window1M tokens (no long-context premium)1M tokens
Max output128K tokens128K tokens
Knowledge cutoffJanuary 2026Earlier cutoff; same tokenizer as Fable 5
Input / output price$10 / $50 per MTok$5 / $25 per MTok
Batch API (50% off)$5 / $25 per MTok$2.50 / $12.50 per MTok
Cache write (5-min / 1-hr)$12.50 / $20 per MTok$6.25 / $10 per MTok
Cache hit$1 per MTok$0.50 per MTok
Minimum cacheable prefix512 tokens (1,024 on Bedrock)Larger minimum
Thinking modesAdaptive, always on - cannot be disabled (disabled -> 400)Adaptive (opt-in) or off; off by default
Effort levelslow - medium - high (default) - xhigh - maxlow - medium - high (default) - xhigh - max
Sampling params / prefillsRemoved (400)Removed (400)
Fast modeNot availableNot available (Opus 4.6 Fast remains)
Safety systemReal-time classifiers + opt-in fallback to Opus 4.8Standard safeguards

The API surfaces are nearly identical - Fable 5's only new breaking change over Opus 4.8 is that thinking can no longer be turned off. See the migration guide for the full checklist.

Benchmarks

Launch-day numbers, as published by Anthropic:

BenchmarkFable 5Opus 4.8
SWE-Bench Pro80.3%69.2%+11.1
FrontierCode Diamond29.3%13.4%+15.9
Terminal-Bench 2.188.0%82.7%+5.3
GDPval-AA (Elo)19321890+42
Humanity's Last Exam (tools)64.5%57.9%+6.6
OSWorld-Verified85.0%83.4%+1.6
Legal Agent13.3%10.4%+2.9
HealthBench Professional66.0%56.9%+9.1

The pattern is consistent: the gaps are widest exactly where the work is hardest. FrontierCode Diamond - problems chosen to be at the edge of what any model can do - more than doubles. SWE-Bench Pro jumps eleven points. On easier, more saturated evaluations like OSWorld, the gap narrows. Full analysis in our benchmarks breakdown.

The cost math: 2x price = 2x cost

Per-token, Fable 5 costs exactly double. Per-task, the early evidence is more interesting. A Canva engineer on Hacker News reported Fable 5 getting "better results with about half the tokens" on their internal workloads - and at half the tokens, 2x the per-token price works out to roughly effective cost parity with Opus 4.8. The mechanism is plausible: Fable 5 consolidates tool calls more aggressively, takes fewer wrong turns in agentic loops, and needs fewer retry rounds, so a completed task consumes fewer total tokens even though each token costs more.

The counterpoint comes from Simon Willison, who spent $110.42 in a single day of heavy experimentation and criticized the price alongside latency and guardrail false positives. His effort-level test on a single prompt ran from 1,929 tokens ($0.10) at low to 14,430 tokens ($0.72) at max - a 7x spread on the same input. The lesson: Fable 5's cost is dominated by how you configure it, and an untuned deployment at high effort across all traffic will absolutely cost more than Opus 4.8.

Both stories are true at once. Token efficiency gains show up on hard, multi-step tasks where Opus 4.8 would have flailed; they don't show up on simple completions, where you're just paying double for headroom you don't use.

When to choose each

Choose Opus 4.8 for cost-sensitive, high-volume workloads: summarization pipelines, classification at scale, chat products with tight unit economics, and anything where Opus 4.8 already succeeds reliably. It remains an excellent frontier model at half the price, and it is the designated fallback target when Fable 5's classifiers refuse.

Choose Fable 5 for the hardest long-horizon work: multi-hour agentic coding runs, large migrations (Stripe migrated a 50-million-line Ruby codebase in a day), deep research, legal and healthcare analysis, and any task where Opus 4.8's failure rate forces human intervention. When a failed task costs engineer time, the model that finishes the job wins regardless of token price. Replit's take - it "one-shots... understands what builders mean" - and Hebbia's 10-point jump past 90% on their core analytics benchmark are both this category.

A practical pattern: route by difficulty. Keep Opus 4.8 (or Sonnet 4.6 at $3/$15) on the high-volume easy path, and send the work that actually needs a frontier model to Fable 5 at effort: "high". Measure tokens per completed task, not tokens per request - that is the number the Canva report is really about.

Related reading