Claude Fable 5 Cost Optimization: The 10-80-10 Routing Guide

Fable 5 is the best AI model I have ever used.

It is also insanely expensive.

In my first few hours testing it, I nearly burned through my entire usage limit.

And I was not even doing anything crazy.

Fable costs twice as much as Opus 4.8.

And because it is so smart, it actually overthinks — running loops and burning tokens in ways no previous Claude model has.

Most people are using it completely wrong.

Here is the exact system I built to cut my Fable token costs by 50%+.

No new tools. No less output. Just smarter routing.

The mistake everyone makes on day one

You open Claude Code.

Fable is now the default model.

You start chatting.

You ask it to fix a typo. You ask it to format some JSON. You ask it to rename a variable.

Fable thinks for 12 seconds, burns 8,000 tokens of reasoning, returns the answer.

Cost: $0.60 for a task Haiku would have nailed for $0.02.

You are paying surgeon rates for small talk.

Fable is an architect.

Not a roommate.

The moment you internalise that, everything changes.

The 10-80-10 System (The exact framework Anthropic engineers use themselves)

Every Fable project has three phases.

Most people run all three on Fable.

The smart move is running only two of them on Fable.

The first 10% — Planning

This is where Fable earns its price tag.

Before any project starts, use Fable to define:

→ The structure and approach → Success criteria → Constraints and edge cases → What could go wrong

Think of building a house.

The most expensive mistake is giving builders a bad blueprint.

Get the architecture right first.

Fable is extraordinary at this.

The middle 80% — Execution

This is where most tokens get burned.

The back and forth. The iteration. The implementation loops. The grunt work of actually completing tasks.

Fable does not need to be here.

Switch to Opus 4.8 for standard work. Use Haiku for light tasks. Use Codex or GPT-5.5 for mechanical execution.

You get Fable-quality architecture without paying Fable prices for every token of execution.

The final 10% — Review

Bring Fable back in.

Have it review the output against the original plan:

→ Does the result match the architecture?

→ Are there gaps or edge cases missed?

→ Does anything need fixing before it ships?

Because Fable reviews a finished output rather than generating from scratch, it uses a fraction of the tokens it would have burned doing the whole task.

[INSERT IMAGE 2 — PROMPT BELOW]

The CLAUDE.md routing table (One file that controls everything)

This is the biggest unlock.

Keep a single routing table in your CLAUDE.md.

Let Fable act as the orchestrator that reads it and dispatches work to the right model automatically.

Here is the exact routing table I use:

markdown

1## Model Routing Table
2
3### Fable 5 (orchestrator only)
4Use for: planning, architecture, reviewing final output
5Never use for: mechanical tasks, bulk generation, boilerplate
6Effort level: high (never xhigh — it is a furnace with worse outputs)
7
8### Opus 4.8 (deep reasoning executor)
9Use for: complex debugging, multi-step reasoning, anything
10         that needs real thinking but is not architecture
11Cost tier: standard
12
13### Sonnet 5 (mechanical work executor)
14Use for: code generation, refactoring, standard feature work
15Cost tier: cheap
16
17### Codex / GPT-5.5 (peer executor)
18Use for: implementation tasks, UI/UX verification,
19         well-spec'd execution work
20Note: Fable can learn to steer Codex — teach it how once
21Cost tier: often free on Codex plan
22
23### Haiku (bulk executor)
24Use for: formatting, lint, simple edits, boilerplate,
25         rename refactors, test scaffolding
26Never spawn further subagents from Haiku
27Cost tier: cheapest
28
29### Kimi / GLM-5.2 (long context executor)
30Use for: reading huge files, long-horizon repo analysis
31         so Fable never spends tokens on it
32Cost tier: very cheap
33
34### DeepSeek / Qwen (dirt cheap grunt work)
35Use for: boilerplate, test writing, data cleaning,
36         translations, first-draft docs, bulk generation
37Cost tier: near-free

Fable never touches the cheap work directly.

It plans, delegates to the right tier, then checks results against the plan.

The expensive brain only spends tokens deciding.

That one file is why my bill went DOWN while my output went UP.

The exact CLAUDE.md setup that saved 70% of my tokens

Here is the full orchestration section I dropped into my CLAUDE.md:

markdown

1## Orchestration Workflow
2
3You (Fable) are the orchestrator. Plan, decompose, synthesize.
4Do NOT execute mechanical tasks yourself.
5
6### Delegation rules:
7- Reasoning-heavy phases → deep-reasoner (Opus 4.8)
8- Mechanical work → fast-worker (Sonnet/Haiku)
9- Codebase analysis / huge files → Kimi (long context)
10- Boilerplate / bulk → DeepSeek or Qwen
11- Peer review from different perspective → Codex
12
13### Codex is a peer, not a reviewer:
14Treat Codex as a cracked Sr. engineer from a different
15perspective. For high-stakes decisions: task Opus + Codex
16on the same problem in parallel, synthesize the best of
17both without showing either the other's answer.
18
19### Context discipline:
20Keep your own context lean.
21Never re-read files you have already processed.
22Summarise tool outputs before feeding back to context.
23Ask models to return concise conclusions you can act on.
24
25### Effort levels:
26- Planning and architecture: high effort
27- Review passes: medium effort
28- Never use xhigh/max by default — costs more, often worse

Now prompt Fable like a tech lead:

markdown

1Goal: [what you want]
2Context: [files, constraints, what you are scared of]
3
4You are the lead.
5Delegate reasoning to deep-reasoner (Opus).
6Delegate grunt work to fast-worker (Sonnet/Haiku).
7Use Codex for fresh-perspective problems.
8
9Show me your plan first, then execute.

That is it.

Fable plans. Everything else executes. Bill stays flat.

Installing the Codex plugin — the multiplier move

This is the setup most people skip.

Codex + Fable together is 10x better than Fable alone.

Fable does the architecture. Codex executes at GPT-5.5 quality. You barely touch your Claude limits.

Setup in under 5 minutes:

Step 1: Install Codex CLI on your machine

bash

1npm install -g @openai/codex

Step 2: Add the plugin inside Claude Code

text

1/plugin marketplace add openai/codex-plugin-cc
2/plugin install codex@openai-codex
3/codex:setup

Step 3: Create two subagents in Claude Code

text

1/agents
2
3→ deep-reasoner
4  Model: Opus 4.8
5  Instructions: "Use for reasoning-heavy phases, architecture,
6  debugging complex issues. Think thoroughly, return a concise
7  conclusion the orchestrator can act on."
8
9→ fast-worker
10  Model: Sonnet 5
11  Instructions: "Use for mechanical tasks, boilerplate, tests,
12  formatting, simple edits. Execute efficiently."

Step 4: Ask Fable to write a SKILL.md for Codex

text

1Write a SKILL.md that teaches Codex exactly how to:
2- Read and execute implementation plans
3- Run tests and report results back to you
4- Handle the specific file structure in this project

Make it so Codex needs zero hand-holding on mechanical work.

Fable writes the skill once.

Codex reads it on every future run.

Your Codex output immediately gets 10x better.

The /loop command (The most powerful way to use Fable right now)

Most people still prompt the old way.

You prompt → Fable responds → you review → you reprompt → repeat.

In that model, you are the loop.

You manually verify every step, every correction, every follow-up.

Loops remove you as the bottleneck.

How it works:

You give Fable a goal upfront.

It spins up subagents to work toward that goal.

The agents prompt themselves and report back when done.

The two commands:

text

1/goal — define the task and end state
2
3Structure:
4/goal [task] until [measurable end state] without [constraints]
5
6Example:
7/goal refactor the auth module until all 47 tests pass
8without touching the payment service or database schema

text

1/loop — run a prompt on a schedule automatically
2
3Structure:
4/loop [your prompt] --interval 30m --expires 8h
5
6Example:
7/loop run security check on all API endpoints
8--interval 24h --expires 7d

The combination:

text

1/goal rebuild the dashboard component until Lighthouse
2score exceeds 90 on mobile without breaking existing tests
3
4/loop run /goal above --interval 6h --expires 48h

Fable designs the loop. Cheaper models run the 80% execution inside the loop. Fable only comes back in when the loop closes or hits a blocker.

You wake up to a finished task.

[INSERT IMAGE 5 — PROMPT BELOW]

7 prompts to run in Fable right now

These are the highest-leverage things to do with Fable before using it for anything else.

Not vibe coding.

Not shipping features.

Sharpening every system you already have.

1. Find what is actually worth running on Fable

text

1You are Fable 5, the most capable model available.
2
3Look through my projects, docs, and memory.
4
5List the top 5 tasks genuinely worth running on you.
6
7Rank them with a one-line reason each.
8
9Do not do the work yet.

2. Redesign how you build before you build anything

text

1I want you to completely audit and redesign my coding workflow.
2
3Here is how I currently work: [describe your process]
4
5My goal is: [what you are trying to ship]
6
7Review, audit, sharpen, and improve my system.
8
9Do not write code. Redesign the factory before we run it.

3. Plan the big project — do not build it yet

text

1I want to plan: [describe the project]
2
3Do not build yet.
4
5Lay out the full plan: phases, key decisions, risks,
6and open questions.
7
8Flag anything that could sink it.
9
10Make the plan clear enough that Sonnet or Codex could
11execute it step by step without asking me questions.

4. Find everything wrong before you ship

text

1I am about to ship this project.
2
3Find everything wrong with it first.
4
5Read the whole codebase.
6
7Hunt for real bugs, broken edge cases, and anything that
8will break in front of a user.
9
10List each issue with: how to reproduce it and the fix.
11
12Hold a high bar. Be ruthless.

5. Rebuild your CLAUDE.md from scratch

text

1Read my current CLAUDE.md.
2
3It was written for older models and is bloated.
4
5Shorter, cleaner instructions perform better and cost less
6in Fable.
7
8Rewrite it:
9- Remove instructions Fable no longer needs
10- Tighten every workflow
11- Add the model routing table from our conversation
12- Keep each section under 5 lines where possible
13
14Fable will figure out the rest on its own.

6. Get business advice from everything it knows about you

text

1You are my business advisor.
2
3Read my plan doc, connected tools, and memory.
4
5Write a one-page assessment of my business and:
6- The top 3 things to focus on for the next 3 months
7- What to drop and why
8- One thing I am probably blind to

7. Security sweep on autopilot

text

1/loop run a security check on all my API endpoints.
2
3Look for: exposed keys, missing auth, rate limit gaps,
4injection vectors, and anything a malicious user could abuse.
5
6Report only real issues with severity ratings.
7
8--interval 24h --expires 7d

Effort levels — the most misunderstood setting

Most people default to max or xhigh.

This is wrong.

Here is what actually happens at each level:

→ Low: Fast, cheap, surprisingly capable for simple tasks. Many people report amazing outputs here.

→ Medium: The sweet spot. Fable on medium beats Opus on extra high. Default to this.

→ High: Use for: hard debugging, multi-file refactors, architecture decisions. Real reasoning power.

→ xhigh / max: Token furnace. Often produces WORSE outputs than High. Reserve for the genuinely hardest problems you have ever faced.

The rule: start Medium. Move to High only if quality is actually falling short. Never default to max.

One more setting that kills budgets:

Extended Thinking — leave it off by default.

Turn it on only for problems that genuinely require it. Leaving it on permanently is like leaving your engine running in the driveway.

The /handoff trick — fix context window bloat

Long sessions are a silent killer.

Every turn re-sends the full conversation history.

A 200k-token session becomes the most expensive thing on your bill.

The fix: open new chats frequently.

But you don't want to lose context when you do.

Use the /handoff skill:

text

1Give me a prompt I can use to restart this session in
2a new chat without losing any of our context.
3
4Include:
5- What we decided
6- What we built
7- What the next steps are
8- Any important constraints I must remember
9
10Make it under 500 tokens so the new session starts lean.

Copy that output.

Open new chat.

Paste it.

Continue exactly where you left off at a fraction of the context cost.

Fresh session every 30–60 minutes = massive token savings.

The 4 expensive mistakes to avoid

Mistake 1: Fable is now the default.

When you open Claude Code, it automatically uses Fable.

Check the model selector before every session.

This single habit has caught me accidentally running Fable on normal chats more times than I want to admit.

Mistake 2: No spending cap.

On July 7th, Fable moves off standard subscriptions.

Add a credit card and set a hard monthly cap immediately.

Settings → Usage → Adjust Limit.

Fable burns tokens fast on autonomous runs and long sessions.

Without a hard cap, one overnight agent run can rack up a bill before you wake up.

Someone already got billed $960 on a single prompt.

Set the cap tonight.

Mistake 3: Asking it to explain its reasoning.

That one request can trip classifiers and your work quietly gets handled by a weaker model while you think you are still on Fable.

Skip the "why" requests. Judge output quality, not process.

Mistake 4: Drip-feeding it tiny prompts.

Fable holds hours of context in its head.

Give it the whole messy thing at once.

Full context. The constraints. What you are actually scared of.

I gave Fable a refactor I had been dreading for weeks in a single brief. It came back done.

Drip-feeding it wastes the one thing it is best at.

The model cost comparison — know what you are actually paying

Before you route anything, know the price of each model:

Model Input ($/M) Output ($/M) Best for

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Fable 5 ~$15 ~$75 Planning, review

Opus 4.8 ~$5 ~$25 Deep reasoning

Sonnet 5 ~$3 ~$15 Standard execution

Kimi K2.7 ~$0.95 ~$4.00 Bulk coding, long context

GLM-5.2 ~$1.40 ~$4.40 Repo-scale work

DeepSeek v4 ~$0.28 ~$1.10 Dirt cheap grunt work

Haiku 4.5 ~$1 ~$5 Cleanup, formatting

Local (Qwen/Llama) $0 $0 Autocomplete, boilerplate

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The gap between Fable and DeepSeek: 53x on input. 68x on output.

The same 30-step refactor agent: → All Fable: ~$25 per run → Fable plan + Kimi execution: ~$1.40 per run → Same shipped code. Same tests passing.

Routing is not about being cheap.

It is about being precise.

The full system — what it looks like running

Before this system:

→ Everything routes to Fable

→ Bill spikes every time you ship

→ Usage limits hit mid-session

→ You ration prompts like a budget traveler

After this system:

→ Fable handles the 10% that actually needs it

→ Cheap models handle the 80% of execution

→ /loop runs overnight without touching your limit

→ You ship more, spend less, never hit a rate limit

The three-line summary:

Fable plans. Others execute. Fable reviews.

That single rule drops your bill by 50%+ before you change anything else.

The rest is optimization.

Run this tonight

Drop this into Fable right now:

Read my current CLAUDE.md and all my active projects.

Your job: set up the 10-80-10 routing system for my workflow.

Create:

Updated CLAUDE.md with the full model routing table
A list of my current active tasks ranked by which model should handle each one
Three /goal prompts I can run tonight on cheaper models based on what you see in my projects

Do not execute anything. Just plan and route.

Fable does the planning.

You wake up with a complete routing system.

And a bill that does not give you a heart attack.

If this saved you money:

→ Repost so other builders stop burning their budgets

→ Follow @sairahul1 for more systems like this

→ Bookmark this — the CLAUDE.md configs work, paste them tonight

Subscribe to theaibuilders.co for more such interesting articles

I write about AI, building products, and systems that work while you sleep.

Resources:

→ Entelligence token router: entelligence.ai/blogs/entelligence-token-router

→ Three-model workflow that cut bills 80%: entelligence.ai/blogs/our-three-model-coding-workflow-that-cut-our-ai-bill-80

→ Claude Code: claude.ai/code

→ Codex CLI: npmjs.com/package/@openai/codex

How To Use Claude Fable 5 Without Going Bankrupt

The 10-80-10 System (The exact framework Anthropic engineers use themselves)

The CLAUDE.md routing table (One file that controls everything)

The exact CLAUDE.md setup that saved 70% of my tokens

Installing the Codex plugin — the multiplier move

The /loop command (The most powerful way to use Fable right now)

7 prompts to run in Fable right now

Effort levels — the most misunderstood setting

The /handoff trick — fix context window bloat

The 4 expensive mistakes to avoid

The model cost comparison — know what you are actually paying

The full system — what it looks like running

Run this tonight

If this saved you money:

Resources:

Turn one viral article into a full content workflow

บทความไวรัลล่าสุด

Private Equity Underestimates AI

How to Turn Sonnet 5 into Fable 5: 7 God-Tier Settings from Interviews with Claude

Designing a programmable runtime for agent orchestration

Chiefs and Morris part ways

Kioxia $KXIAY: Investment Thesis

Most People Are Using Claude Wrong

How To Use Claude Fable 5 Without Going Bankrupt

The 10-80-10 System (The exact framework Anthropic engineers use themselves)

The CLAUDE.md routing table (One file that controls everything)

The exact CLAUDE.md setup that saved 70% of my tokens

Installing the Codex plugin — the multiplier move

The /loop command (The most powerful way to use Fable right now)

7 prompts to run in Fable right now

Effort levels — the most misunderstood setting

The /handoff trick — fix context window bloat

The 4 expensive mistakes to avoid

The model cost comparison — know what you are actually paying

The full system — what it looks like running

Run this tonight

If this saved you money:

Resources:

Turn one viral article into a full content workflow

เปลี่ยน Markdown ของคุณให้เป็นบทความ 𝕏 ที่สะอาดตา

บทความไวรัลล่าสุด

Private Equity Underestimates AI

How to Turn Sonnet 5 into Fable 5: 7 God-Tier Settings from Interviews with Claude

Designing a programmable runtime for agent orchestration

Chiefs and Morris part ways

Kioxia $KXIAY: Investment Thesis

Most People Are Using Claude Wrong