Claude Fable 5 vs Opus 4.8
Claude Fable 5 launched on June 9 at exactly twice the per-token price of Opus 4.8 - $10/$50 per million tokens against $5/$25. Anthropic calls it "a Mythos-class model made safe for general use," and the benchmark deltas are the largest between consecutive Anthropic flagships in years. The question for anyone running Opus 4.8 in production is not whether Fable 5 is better - it clearly is - but whether it is better per dollar for your workload. Here is the full comparison.
Specs and pricing, side by side
| Claude Fable 5 | Claude Opus 4.8 | |
|---|---|---|
| Model ID | claude-fable-5 | claude-opus-4-8 |
| Context window | 1M tokens (no long-context premium) | 1M tokens |
| Max output | 128K tokens | 128K tokens |
| Knowledge cutoff | January 2026 | Earlier cutoff; same tokenizer as Fable 5 |
| Input / output price | $10 / $50 per MTok | $5 / $25 per MTok |
| Batch API (50% off) | $5 / $25 per MTok | $2.50 / $12.50 per MTok |
| Cache write (5-min / 1-hr) | $12.50 / $20 per MTok | $6.25 / $10 per MTok |
| Cache hit | $1 per MTok | $0.50 per MTok |
| Minimum cacheable prefix | 512 tokens (1,024 on Bedrock) | Larger minimum |
| Thinking modes | Adaptive, always on - cannot be disabled (disabled -> 400) | Adaptive (opt-in) or off; off by default |
| Effort levels | low - medium - high (default) - xhigh - max | low - medium - high (default) - xhigh - max |
| Sampling params / prefills | Removed (400) | Removed (400) |
| Fast mode | Not available | Not available (Opus 4.6 Fast remains) |
| Safety system | Real-time classifiers + opt-in fallback to Opus 4.8 | Standard safeguards |
The API surfaces are nearly identical - Fable 5's only new breaking change over Opus 4.8 is that thinking can no longer be turned off. See the migration guide for the full checklist.
Benchmarks
Launch-day numbers, as published by Anthropic:
| Benchmark | Fable 5 | Opus 4.8 | |
|---|---|---|---|
| SWE-Bench Pro | 80.3% | 69.2% | +11.1 |
| FrontierCode Diamond | 29.3% | 13.4% | +15.9 |
| Terminal-Bench 2.1 | 88.0% | 82.7% | +5.3 |
| GDPval-AA (Elo) | 1932 | 1890 | +42 |
| Humanity's Last Exam (tools) | 64.5% | 57.9% | +6.6 |
| OSWorld-Verified | 85.0% | 83.4% | +1.6 |
| Legal Agent | 13.3% | 10.4% | +2.9 |
| HealthBench Professional | 66.0% | 56.9% | +9.1 |
The pattern is consistent: the gaps are widest exactly where the work is hardest. FrontierCode Diamond - problems chosen to be at the edge of what any model can do - more than doubles. SWE-Bench Pro jumps eleven points. On easier, more saturated evaluations like OSWorld, the gap narrows. Full analysis in our benchmarks breakdown.
The cost math: 2x price = 2x cost
Per-token, Fable 5 costs exactly double. Per-task, the early evidence is more interesting. A Canva engineer on Hacker News reported Fable 5 getting "better results with about half the tokens" on their internal workloads - and at half the tokens, 2x the per-token price works out to roughly effective cost parity with Opus 4.8. The mechanism is plausible: Fable 5 consolidates tool calls more aggressively, takes fewer wrong turns in agentic loops, and needs fewer retry rounds, so a completed task consumes fewer total tokens even though each token costs more.
The counterpoint comes from Simon Willison, who spent $110.42 in a single day of heavy experimentation and criticized the price alongside latency and guardrail false positives. His effort-level test on a single prompt ran from 1,929 tokens ($0.10) at low to 14,430 tokens ($0.72) at max - a 7x spread on the same input. The lesson: Fable 5's cost is dominated by how you configure it, and an untuned deployment at high effort across all traffic will absolutely cost more than Opus 4.8.
Both stories are true at once. Token efficiency gains show up on hard, multi-step tasks where Opus 4.8 would have flailed; they don't show up on simple completions, where you're just paying double for headroom you don't use.
When to choose each
Choose Opus 4.8 for cost-sensitive, high-volume workloads: summarization pipelines, classification at scale, chat products with tight unit economics, and anything where Opus 4.8 already succeeds reliably. It remains an excellent frontier model at half the price, and it is the designated fallback target when Fable 5's classifiers refuse.
Choose Fable 5 for the hardest long-horizon work: multi-hour agentic coding runs, large migrations (Stripe migrated a 50-million-line Ruby codebase in a day), deep research, legal and healthcare analysis, and any task where Opus 4.8's failure rate forces human intervention. When a failed task costs engineer time, the model that finishes the job wins regardless of token price. Replit's take - it "one-shots... understands what builders mean" - and Hebbia's 10-point jump past 90% on their core analytics benchmark are both this category.
A practical pattern: route by difficulty. Keep Opus 4.8 (or Sonnet 4.6 at $3/$15) on the high-volume easy path, and send the work that actually needs a frontier model to Fable 5 at effort: "high". Measure tokens per completed task, not tokens per request - that is the number the Canva report is really about.