Fable 5 is the best AI model I have ever used.
It is also insanely expensive.
In my first few hours testing it, I nearly burned through my entire usage limit.
And I was not even doing anything crazy.
Fable costs twice as much as Opus 4.8.
And because it is so smart, it actually overthinks — running loops and burning tokens in ways no previous Claude model has.
Most people are using it completely wrong.
Here is the exact system I built to cut my Fable token costs by 50%+.
No new tools. No less output. Just smarter routing.
The mistake everyone makes on day one
You open Claude Code.
Fable is now the default model.
You start chatting.
You ask it to fix a typo. You ask it to format some JSON. You ask it to rename a variable.
Fable thinks for 12 seconds, burns 8,000 tokens of reasoning, returns the answer.
Cost: $0.60 for a task Haiku would have nailed for $0.02.
You are paying surgeon rates for small talk.
Fable is an architect.
Not a roommate.
The moment you internalise that, everything changes.

The 10-80-10 System (The exact framework Anthropic engineers use themselves)
Every Fable project has three phases.
Most people run all three on Fable.
The smart move is running only two of them on Fable.
The first 10% — Planning
This is where Fable earns its price tag.
Before any project starts, use Fable to define:
→ The structure and approach → Success criteria → Constraints and edge cases → What could go wrong
Think of building a house.
The most expensive mistake is giving builders a bad blueprint.
Get the architecture right first.
Fable is extraordinary at this.
The middle 80% — Execution
This is where most tokens get burned.
The back and forth. The iteration. The implementation loops. The grunt work of actually completing tasks.
Fable does not need to be here.
Switch to Opus 4.8 for standard work. Use Haiku for light tasks. Use Codex or GPT-5.5 for mechanical execution.
You get Fable-quality architecture without paying Fable prices for every token of execution.
The final 10% — Review
Bring Fable back in.
Have it review the output against the original plan:
→ Does the result match the architecture?
→ Are there gaps or edge cases missed?
→ Does anything need fixing before it ships?
Because Fable reviews a finished output rather than generating from scratch, it uses a fraction of the tokens it would have burned doing the whole task.
[INSERT IMAGE 2 — PROMPT BELOW]

The CLAUDE.md routing table (One file that controls everything)
This is the biggest unlock.
Keep a single routing table in your CLAUDE.md.
Let Fable act as the orchestrator that reads it and dispatches work to the right model automatically.
Here is the exact routing table I use:
1## Model Routing Table23### Fable 5 (orchestrator only)4Use for: planning, architecture, reviewing final output5Never use for: mechanical tasks, bulk generation, boilerplate6Effort level: high (never xhigh — it is a furnace with worse outputs)78### Opus 4.8 (deep reasoning executor)9Use for: complex debugging, multi-step reasoning, anything10 that needs real thinking but is not architecture11Cost tier: standard1213### Sonnet 5 (mechanical work executor)14Use for: code generation, refactoring, standard feature work15Cost tier: cheap1617### Codex / GPT-5.5 (peer executor)18Use for: implementation tasks, UI/UX verification,19 well-spec'd execution work20Note: Fable can learn to steer Codex — teach it how once21Cost tier: often free on Codex plan2223### Haiku (bulk executor)24Use for: formatting, lint, simple edits, boilerplate,25 rename refactors, test scaffolding26Never spawn further subagents from Haiku27Cost tier: cheapest2829### Kimi / GLM-5.2 (long context executor)30Use for: reading huge files, long-horizon repo analysis31 so Fable never spends tokens on it32Cost tier: very cheap3334### DeepSeek / Qwen (dirt cheap grunt work)35Use for: boilerplate, test writing, data cleaning,36 translations, first-draft docs, bulk generation37Cost tier: near-free
Fable never touches the cheap work directly.
It plans, delegates to the right tier, then checks results against the plan.
The expensive brain only spends tokens deciding.
That one file is why my bill went DOWN while my output went UP.

The exact CLAUDE.md setup that saved 70% of my tokens
Here is the full orchestration section I dropped into my CLAUDE.md:
1## Orchestration Workflow23You (Fable) are the orchestrator. Plan, decompose, synthesize.4Do NOT execute mechanical tasks yourself.56### Delegation rules:7- Reasoning-heavy phases → deep-reasoner (Opus 4.8)8- Mechanical work → fast-worker (Sonnet/Haiku)9- Codebase analysis / huge files → Kimi (long context)10- Boilerplate / bulk → DeepSeek or Qwen11- Peer review from different perspective → Codex1213### Codex is a peer, not a reviewer:14Treat Codex as a cracked Sr. engineer from a different15perspective. For high-stakes decisions: task Opus + Codex16on the same problem in parallel, synthesize the best of17both without showing either the other's answer.1819### Context discipline:20Keep your own context lean.21Never re-read files you have already processed.22Summarise tool outputs before feeding back to context.23Ask models to return concise conclusions you can act on.2425### Effort levels:26- Planning and architecture: high effort27- Review passes: medium effort28- Never use xhigh/max by default — costs more, often worse
Now prompt Fable like a tech lead:
1Goal: [what you want]2Context: [files, constraints, what you are scared of]34You are the lead.5Delegate reasoning to deep-reasoner (Opus).6Delegate grunt work to fast-worker (Sonnet/Haiku).7Use Codex for fresh-perspective problems.89Show me your plan first, then execute.
That is it.
Fable plans. Everything else executes. Bill stays flat.
Installing the Codex plugin — the multiplier move
This is the setup most people skip.
Codex + Fable together is 10x better than Fable alone.
Fable does the architecture. Codex executes at GPT-5.5 quality. You barely touch your Claude limits.
Setup in under 5 minutes:
Step 1: Install Codex CLI on your machine
1npm install -g @openai/codex
Step 2: Add the plugin inside Claude Code
1/plugin marketplace add openai/codex-plugin-cc2/plugin install codex@openai-codex3/codex:setup
Step 3: Create two subagents in Claude Code
1/agents23→ deep-reasoner4 Model: Opus 4.85 Instructions: "Use for reasoning-heavy phases, architecture,6 debugging complex issues. Think thoroughly, return a concise7 conclusion the orchestrator can act on."89→ fast-worker10 Model: Sonnet 511 Instructions: "Use for mechanical tasks, boilerplate, tests,12 formatting, simple edits. Execute efficiently."
Step 4: Ask Fable to write a SKILL.md for Codex
1Write a SKILL.md that teaches Codex exactly how to:2- Read and execute implementation plans3- Run tests and report results back to you4- Handle the specific file structure in this project
Make it so Codex needs zero hand-holding on mechanical work.
Fable writes the skill once.
Codex reads it on every future run.
Your Codex output immediately gets 10x better.

The /loop command (The most powerful way to use Fable right now)
Most people still prompt the old way.
You prompt → Fable responds → you review → you reprompt → repeat.
In that model, you are the loop.
You manually verify every step, every correction, every follow-up.
Loops remove you as the bottleneck.
How it works:
You give Fable a goal upfront.
It spins up subagents to work toward that goal.
The agents prompt themselves and report back when done.
The two commands:
1/goal — define the task and end state23Structure:4/goal [task] until [measurable end state] without [constraints]56Example:7/goal refactor the auth module until all 47 tests pass8without touching the payment service or database schema
1/loop — run a prompt on a schedule automatically23Structure:4/loop [your prompt] --interval 30m --expires 8h56Example:7/loop run security check on all API endpoints8--interval 24h --expires 7d
The combination:
1/goal rebuild the dashboard component until Lighthouse2score exceeds 90 on mobile without breaking existing tests34/loop run /goal above --interval 6h --expires 48h
Fable designs the loop. Cheaper models run the 80% execution inside the loop. Fable only comes back in when the loop closes or hits a blocker.
You wake up to a finished task.
[INSERT IMAGE 5 — PROMPT BELOW]
7 prompts to run in Fable right now
These are the highest-leverage things to do with Fable before using it for anything else.
Not vibe coding.
Not shipping features.
Sharpening every system you already have.
1. Find what is actually worth running on Fable
1You are Fable 5, the most capable model available.23Look through my projects, docs, and memory.45List the top 5 tasks genuinely worth running on you.67Rank them with a one-line reason each.89Do not do the work yet.
2. Redesign how you build before you build anything
1I want you to completely audit and redesign my coding workflow.23Here is how I currently work: [describe your process]45My goal is: [what you are trying to ship]67Review, audit, sharpen, and improve my system.89Do not write code. Redesign the factory before we run it.
3. Plan the big project — do not build it yet
1I want to plan: [describe the project]23Do not build yet.45Lay out the full plan: phases, key decisions, risks,6and open questions.78Flag anything that could sink it.910Make the plan clear enough that Sonnet or Codex could11execute it step by step without asking me questions.
4. Find everything wrong before you ship
1I am about to ship this project.23Find everything wrong with it first.45Read the whole codebase.67Hunt for real bugs, broken edge cases, and anything that8will break in front of a user.910List each issue with: how to reproduce it and the fix.1112Hold a high bar. Be ruthless.
5. Rebuild your CLAUDE.md from scratch
1Read my current CLAUDE.md.23It was written for older models and is bloated.45Shorter, cleaner instructions perform better and cost less6in Fable.78Rewrite it:9- Remove instructions Fable no longer needs10- Tighten every workflow11- Add the model routing table from our conversation12- Keep each section under 5 lines where possible1314Fable will figure out the rest on its own.
6. Get business advice from everything it knows about you
1You are my business advisor.23Read my plan doc, connected tools, and memory.45Write a one-page assessment of my business and:6- The top 3 things to focus on for the next 3 months7- What to drop and why8- One thing I am probably blind to
7. Security sweep on autopilot
1/loop run a security check on all my API endpoints.23Look for: exposed keys, missing auth, rate limit gaps,4injection vectors, and anything a malicious user could abuse.56Report only real issues with severity ratings.78--interval 24h --expires 7d

Effort levels — the most misunderstood setting
Most people default to max or xhigh.
This is wrong.
Here is what actually happens at each level:
→ Low: Fast, cheap, surprisingly capable for simple tasks. Many people report amazing outputs here.
→ Medium: The sweet spot. Fable on medium beats Opus on extra high. Default to this.
→ High: Use for: hard debugging, multi-file refactors, architecture decisions. Real reasoning power.
→ xhigh / max: Token furnace. Often produces WORSE outputs than High. Reserve for the genuinely hardest problems you have ever faced.
The rule: start Medium. Move to High only if quality is actually falling short. Never default to max.
One more setting that kills budgets:
Extended Thinking — leave it off by default.
Turn it on only for problems that genuinely require it. Leaving it on permanently is like leaving your engine running in the driveway.
The /handoff trick — fix context window bloat
Long sessions are a silent killer.
Every turn re-sends the full conversation history.
A 200k-token session becomes the most expensive thing on your bill.
The fix: open new chats frequently.
But you don't want to lose context when you do.
Use the /handoff skill:
1Give me a prompt I can use to restart this session in2a new chat without losing any of our context.34Include:5- What we decided6- What we built7- What the next steps are8- Any important constraints I must remember910Make it under 500 tokens so the new session starts lean.
Copy that output.
Open new chat.
Paste it.
Continue exactly where you left off at a fraction of the context cost.
Fresh session every 30–60 minutes = massive token savings.

The 4 expensive mistakes to avoid
Mistake 1: Fable is now the default.
When you open Claude Code, it automatically uses Fable.
Check the model selector before every session.
This single habit has caught me accidentally running Fable on normal chats more times than I want to admit.
Mistake 2: No spending cap.
On July 7th, Fable moves off standard subscriptions.
Add a credit card and set a hard monthly cap immediately.
Settings → Usage → Adjust Limit.
Fable burns tokens fast on autonomous runs and long sessions.
Without a hard cap, one overnight agent run can rack up a bill before you wake up.
Someone already got billed $960 on a single prompt.
Set the cap tonight.
Mistake 3: Asking it to explain its reasoning.
That one request can trip classifiers and your work quietly gets handled by a weaker model while you think you are still on Fable.
Skip the "why" requests. Judge output quality, not process.
Mistake 4: Drip-feeding it tiny prompts.
Fable holds hours of context in its head.
Give it the whole messy thing at once.
Full context. The constraints. What you are actually scared of.
I gave Fable a refactor I had been dreading for weeks in a single brief. It came back done.
Drip-feeding it wastes the one thing it is best at.

The model cost comparison — know what you are actually paying
Before you route anything, know the price of each model:
Model Input ($/M) Output ($/M) Best for
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Fable 5 ~$15 ~$75 Planning, review
Opus 4.8 ~$5 ~$25 Deep reasoning
Sonnet 5 ~$3 ~$15 Standard execution
Kimi K2.7 ~$0.95 ~$4.00 Bulk coding, long context
GLM-5.2 ~$1.40 ~$4.40 Repo-scale work
DeepSeek v4 ~$0.28 ~$1.10 Dirt cheap grunt work
Haiku 4.5 ~$1 ~$5 Cleanup, formatting
Local (Qwen/Llama) $0 $0 Autocomplete, boilerplate
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
The gap between Fable and DeepSeek: 53x on input. 68x on output.
The same 30-step refactor agent: → All Fable: ~$25 per run → Fable plan + Kimi execution: ~$1.40 per run → Same shipped code. Same tests passing.
Routing is not about being cheap.
It is about being precise.

The full system — what it looks like running
Before this system:
→ Everything routes to Fable
→ Bill spikes every time you ship
→ Usage limits hit mid-session
→ You ration prompts like a budget traveler
After this system:
→ Fable handles the 10% that actually needs it
→ Cheap models handle the 80% of execution
→ /loop runs overnight without touching your limit
→ You ship more, spend less, never hit a rate limit
The three-line summary:
Fable plans. Others execute. Fable reviews.
That single rule drops your bill by 50%+ before you change anything else.
The rest is optimization.
Run this tonight
Drop this into Fable right now:
Read my current CLAUDE.md and all my active projects.
Your job: set up the 10-80-10 routing system for my workflow.
Create:
- Updated CLAUDE.md with the full model routing table
- A list of my current active tasks ranked by which model should handle each one
- Three /goal prompts I can run tonight on cheaper models based on what you see in my projects
Do not execute anything. Just plan and route.
Fable does the planning.
You wake up with a complete routing system.
And a bill that does not give you a heart attack.
If this saved you money:
→ Repost so other builders stop burning their budgets
→ Follow @sairahul1 for more systems like this
→ Bookmark this — the CLAUDE.md configs work, paste them tonight
Subscribe to theaibuilders.co for more such interesting articles
I write about AI, building products, and systems that work while you sleep.
Resources:
→ Entelligence token router: entelligence.ai/blogs/entelligence-token-router
→ Three-model workflow that cut bills 80%: entelligence.ai/blogs/our-three-model-coding-workflow-that-cut-our-ai-bill-80
→ Claude Code: claude.ai/code
→ Codex CLI: npmjs.com/package/@openai/codex





