I'm going to show you step-by-step how to consistently get the best outputs with Claude Code, Codex, OpenCode or whatever harness you're using...
because a harness is just the app you type your agent instructions into, the cockpit you sit in
the instinct is to chase a better model to drop inside it (USING FABLE 5 LOL)
the real edge you can get is somewhere else, in the fact that you stopped at one model at all
and the timing makes that mistake more expensive than it used to be... because the single best model has become a moving target you can't count on:
- Fable 5 is coming back for like a week and then it will be to expensive to afford
- Mythos 5 is locked to a short list of government-vetted companies
- GPT-5.6 Sol went out to about 20 approved firms and nobody else
so betting your whole operation on whichever model is "best" this month keeps losing the moment it gets gated
so the move is a merge of frontier LLMs, a handful of models carrying the work together instead of one model carrying all of it
that's what separates average outputs from the best ones, and it's the first thing i set up on every project now
if you want to learn how to get the most out of these tools and make real money with them, that's what the real time AI ops community is built for: weeklyaiops.com

the slow, miserable back-and-forth
watch how a normal project goes, the bottleneck is sitting right in the middle of it:
you open one harness and hand it the task
then you take its plan, paste it into a second model, and ask "is this right"
the second one finds the holes, so you carry those notes back to the first
you spend the afternoon as a messenger, running context between two models that can't talk to each other
let's be honest, that back-and-forth is a pain in the ass...
i know this loop too well, you have Claude check Codex's plan, then carry the holes back the other way, and the day disappears into ferrying notes between two models

why a merge of AIs beat your favorite model
there's a reason the second model keeps catching things the first one missed...
the model that reviews the work shares the blind spots of the model that made it, because they fail in the same places
so asking one model to check itself gets you a confident "looks good to me" on the exact bug it would have written
a council fixes that structurally, not with a better prompt - because different models cover each other's weak spots
and the proof for this is backed by real papers:
a research team at Sakana built a coordinator small enough to run on a laptop, a model that never answers your question at all
it reads the question
decides which big model should handle each piece
and hands the work out
they pointed it at GPT-5, Gemini and Claude, and it beat all three of them on its own...
then they tried swapping a top-tier model in to do the coordinating instead, and it did worse lol
so the conductor doesn't have to be the strongest model you've got but it needs to read the problem and route it to whoever's best for that part

the three roles, and the one that gets skipped
the coordinator gives each model one job at a time:
thinker: break the task down, build the plan, poke holes in it
worker: do the work, the draft, the code, the numbers, whatever the deliverable is
verifier: judge the result and say "ship it" or "fix this"
the verifier gets skipped, and it's the one that matters most
the work isn't finished when a model produces something... it's finished when a verifier signs off on it
that single rule, a clear "done" condition, is the whole difference between a loop that ends and one that spins in circles forever

hand the back-and-forth to a council
now adding a coordinator might sound like one more layer to manage
but it's actually the reverse, the layer you take out is the one that was you
so the move is simple, you delegate the coordination itself: you stop being the messenger and let a coordinator run the thinker, worker and verifier loop for you
i use Fugu for this now (sakana.ai/fugu), and to be clear, this isn't sponsored, i just love it
it's the productized version of that research, you point your harness at it the same way you'd point at any other model, and it runs the council behind a single request
i'll be honest about what it is, because this is important
it's days old, it takes its time because it's genuinely running a council instead of faking one, and it's at its strongest as the verifier, the one that roasts the work before you ship
$20 gets you in, with a free second month if you start before the end of july
i don't trust it blindly, i trust the council pattern, and this is the easiest way i've found to run that pattern without babysitting it
now let's get into the step-by-step workflow to consistently get the best outputs with ANY agent harness:

run the council at the start and the finish
i bring the council in at two moments on every project, the very start and the delivery
picture a real job, launching a newsletter, building a lead list, shipping a landing page
1. interview first - before any work happens, have your agent interview you in depth (grill-me from matt is a skill that makes the agent question you about everything), the depth of that interview sets the ceiling for everything after it
2. convene the council - to build the plan, the coordinator handles the multi-model part so you're not pasting between tools anymore
3. define the loops - set the goal and the stop condition up front so it can run without you watching every step
4. delegate by role - send the right task to the right place and lean on subagents (a subagent is just a helper agent the main one hands work to)
5. roast at delivery - when the work looks done, send it back through the council to tear it apart before it goes out
same loop, two ends, planning at the start and roasting at the finish - that's the whole engine, and it deeply changed how fast i can ship something i trust

the council is the engine, the setup keeps it sharp
an engine is only as good as the machine around it...
so here are the six moves that keep each agent in the council producing its best, none of them complicated
1. build your own skills, don't download a library
a skill is a saved set of instructions your agent can reuse, like a recipe it follows (nothing more than a markdown file with examples...)
the temptation is to grab a big library of skills someone else published and just run it
i'd pass on that, roughly a third of the public ones carry a security flaw... if not worse lol
and even the clean ones are someone else's context, not yours, a skill only helps once you've personally hit the wall it solves
before that, it's just noise filling up the agent's memory
so this is my call: build the few you actually need, pulled from your own failed runs, that's where the real edge lives
2. default to small CLIs, not MCP servers
two quick words first, an MCP is a way to bolt extra tools onto your agent, and a CLI is a small command you run in the text window
the reflex is to bolt on an MCP for everything, and each one loads its full instruction sheet into the agent's memory before you've even asked your first question
those tool descriptions can eat over 100,000 words of memory up front, crowding out the real work
so for everyday tasks, reach for a small CLI instead, it's lighter, the agent already knows how to run a command, and its output lands in a file instead of clogging the memory
a tool called printing-press (printingpress.dev) will write one of these for any service from a single prompt... abuse it
keep MCPs for the jobs that genuinely need them, shared logins, lots of users, a live connection that has to stay open
the rule is CLI-first, not CLI-only
3. keep the instructions file light
your agent reads an instructions file before every single task, usually AGENTS.md or CLAUDE.md
the urge is to stuff it with everything you can think of, and the agent ends up following less, not more
a model reliably follows somewhere around 150 to 200 instructions, then it starts dropping them on the floor
keep yours under 100 lines - it just works better like that
4. clean the context, keep the memory in files
the context window is your agent's short-term memory, and it fills up and gets dumber as it does
Claude Opus 4.8 is a great example... this is an exceptional model, but as soon as you reach 300-400k tokens, you should 100% clear the context - DON'T USE COMPACTING
so clear it often, and keep the memory that matters in files instead
a learnings.md the agent reads at the start of a session and updates at the end, every time, even when it thinks nothing changed
have it update those notes after each commit, a commit being a saved checkpoint of the work
the durable rules live in the instructions file, the things it learns along the way live in learnings, and if you're editing your rules file after every run, you've put the wrong things in it
i built myself a skill called /before-clear, it simply creates a checkpoint in the project, a temporary file that the agent reads first... it has a very simple todo, brief summary of what happened in last session and what's next most urgent task
i rarely lose any important context from one session to another
5. write few rules, each one plain
a short list of clear rules is MUCH better than a long pile of clever ones
once a rules file gets long, the rules start competing with each other, and the agent has to guess which one wins
write each rule the way you'd tell a new hire, one line, one meaning, no room to interpret
when the rule is clear the agent follows it, when it's vague the agent improvises, and improvising is exactly where your outputs drift sideways
6. delegate the heavy work to subagents
push the messy, expensive work down to subagents, especially anything that touches the browser
driving the browser from the main agent can burn through A LOT of context just on screenshots
hand that same job to a subagent and it comes back with "done, here's the summary" in a line or two
the main agent stays the conductor, the helpers do the reaching, and your main memory stays clean
it's the council idea again, one level down, you coordinate the work instead of doing all of it yourself
the whole operating system, in one block

the best model keeps getting gated, so one model can't be your foundation
hand the coordination to a council: thinker plans, worker builds, verifier signs off
the verifier is the stop condition, the work is done when it says so, not before
run the council twice, at the start to plan and at delivery to roast
i use Fugu to run it, $20 to try, not sponsored, honest about being early
then keep each agent sharp:
build your own skills, don't download a library
CLI-first, MCPs only when you truly need them
instructions file under 100 lines
clear context often, keep memory in files
few rules, each one plain
delegate the heavy work to subagents
everything else on top of that is pure optimization that you probably won't even notice unless you're a software engineer
this is how you build yourself an operating systems that performs, with or without Fable 5...
anyway, i'm building the best AI community in the world, how about you join me ?





