How to Build a Frontier Agent OS with Multi-Model Councils

I'm going to show you step-by-step how to consistently get the best outputs with Claude Code, Codex, OpenCode or whatever harness you're using...

because a harness is just the app you type your agent instructions into, the cockpit you sit in

the instinct is to chase a better model to drop inside it (USING FABLE 5 LOL)

the real edge you can get is somewhere else, in the fact that you stopped at one model at all

and the timing makes that mistake more expensive than it used to be... because the single best model has become a moving target you can't count on:

Fable 5 is coming back for like a week and then it will be to expensive to afford

Mythos 5 is locked to a short list of government-vetted companies

GPT-5.6 Sol went out to about 20 approved firms and nobody else

so betting your whole operation on whichever model is "best" this month keeps losing the moment it gets gated

so the move is a merge of frontier LLMs, a handful of models carrying the work together instead of one model carrying all of it

that's what separates average outputs from the best ones, and it's the first thing i set up on every project now

if you want to learn how to get the most out of these tools and make real money with them, that's what the real time AI ops community is built for: weeklyaiops.com

the slow, miserable back-and-forth

watch how a normal project goes, the bottleneck is sitting right in the middle of it:

you open one harness and hand it the task

then you take its plan, paste it into a second model, and ask "is this right"

the second one finds the holes, so you carry those notes back to the first

you spend the afternoon as a messenger, running context between two models that can't talk to each other

let's be honest, that back-and-forth is a pain in the ass...

i know this loop too well, you have Claude check Codex's plan, then carry the holes back the other way, and the day disappears into ferrying notes between two models

why a merge of AIs beat your favorite model

there's a reason the second model keeps catching things the first one missed...

the model that reviews the work shares the blind spots of the model that made it, because they fail in the same places

so asking one model to check itself gets you a confident "looks good to me" on the exact bug it would have written

a council fixes that structurally, not with a better prompt - because different models cover each other's weak spots

and the proof for this is backed by real papers:

a research team at Sakana built a coordinator small enough to run on a laptop, a model that never answers your question at all

it reads the question

decides which big model should handle each piece

and hands the work out

they pointed it at GPT-5, Gemini and Claude, and it beat all three of them on its own...

then they tried swapping a top-tier model in to do the coordinating instead, and it did worse lol

so the conductor doesn't have to be the strongest model you've got but it needs to read the problem and route it to whoever's best for that part

the three roles, and the one that gets skipped

the coordinator gives each model one job at a time:

thinker: break the task down, build the plan, poke holes in it

worker: do the work, the draft, the code, the numbers, whatever the deliverable is

verifier: judge the result and say "ship it" or "fix this"

the verifier gets skipped, and it's the one that matters most

the work isn't finished when a model produces something... it's finished when a verifier signs off on it

that single rule, a clear "done" condition, is the whole difference between a loop that ends and one that spins in circles forever

hand the back-and-forth to a council

now adding a coordinator might sound like one more layer to manage

but it's actually the reverse, the layer you take out is the one that was you

so the move is simple, you delegate the coordination itself: you stop being the messenger and let a coordinator run the thinker, worker and verifier loop for you

i use Fugu for this now (sakana.ai/fugu), and to be clear, this isn't sponsored, i just love it

it's the productized version of that research, you point your harness at it the same way you'd point at any other model, and it runs the council behind a single request

i'll be honest about what it is, because this is important

it's days old, it takes its time because it's genuinely running a council instead of faking one, and it's at its strongest as the verifier, the one that roasts the work before you ship

$20 gets you in, with a free second month if you start before the end of july

i don't trust it blindly, i trust the council pattern, and this is the easiest way i've found to run that pattern without babysitting it

now let's get into the step-by-step workflow to consistently get the best outputs with ANY agent harness:

run the council at the start and the finish

i bring the council in at two moments on every project, the very start and the delivery

picture a real job, launching a newsletter, building a lead list, shipping a landing page

1. interview first - before any work happens, have your agent interview you in depth (grill-me from matt is a skill that makes the agent question you about everything), the depth of that interview sets the ceiling for everything after it

2. convene the council - to build the plan, the coordinator handles the multi-model part so you're not pasting between tools anymore

3. define the loops - set the goal and the stop condition up front so it can run without you watching every step

4. delegate by role - send the right task to the right place and lean on subagents (a subagent is just a helper agent the main one hands work to)

5. roast at delivery - when the work looks done, send it back through the council to tear it apart before it goes out

same loop, two ends, planning at the start and roasting at the finish - that's the whole engine, and it deeply changed how fast i can ship something i trust

the council is the engine, the setup keeps it sharp

an engine is only as good as the machine around it...

so here are the six moves that keep each agent in the council producing its best, none of them complicated

1. build your own skills, don't download a library

a skill is a saved set of instructions your agent can reuse, like a recipe it follows (nothing more than a markdown file with examples...)

the temptation is to grab a big library of skills someone else published and just run it

i'd pass on that, roughly a third of the public ones carry a security flaw... if not worse lol

and even the clean ones are someone else's context, not yours, a skill only helps once you've personally hit the wall it solves

before that, it's just noise filling up the agent's memory

so this is my call: build the few you actually need, pulled from your own failed runs, that's where the real edge lives

2. default to small CLIs, not MCP servers

two quick words first, an MCP is a way to bolt extra tools onto your agent, and a CLI is a small command you run in the text window

the reflex is to bolt on an MCP for everything, and each one loads its full instruction sheet into the agent's memory before you've even asked your first question

those tool descriptions can eat over 100,000 words of memory up front, crowding out the real work

so for everyday tasks, reach for a small CLI instead, it's lighter, the agent already knows how to run a command, and its output lands in a file instead of clogging the memory

a tool called printing-press (printingpress.dev) will write one of these for any service from a single prompt... abuse it

keep MCPs for the jobs that genuinely need them, shared logins, lots of users, a live connection that has to stay open

the rule is CLI-first, not CLI-only

3. keep the instructions file light

your agent reads an instructions file before every single task, usually AGENTS.md or CLAUDE.md

the urge is to stuff it with everything you can think of, and the agent ends up following less, not more

a model reliably follows somewhere around 150 to 200 instructions, then it starts dropping them on the floor

keep yours under 100 lines - it just works better like that

4. clean the context, keep the memory in files

the context window is your agent's short-term memory, and it fills up and gets dumber as it does

Claude Opus 4.8 is a great example... this is an exceptional model, but as soon as you reach 300-400k tokens, you should 100% clear the context - DON'T USE COMPACTING

so clear it often, and keep the memory that matters in files instead

a learnings.md the agent reads at the start of a session and updates at the end, every time, even when it thinks nothing changed

have it update those notes after each commit, a commit being a saved checkpoint of the work

the durable rules live in the instructions file, the things it learns along the way live in learnings, and if you're editing your rules file after every run, you've put the wrong things in it

i built myself a skill called /before-clear, it simply creates a checkpoint in the project, a temporary file that the agent reads first... it has a very simple todo, brief summary of what happened in last session and what's next most urgent task

i rarely lose any important context from one session to another

5. write few rules, each one plain

a short list of clear rules is MUCH better than a long pile of clever ones

once a rules file gets long, the rules start competing with each other, and the agent has to guess which one wins

write each rule the way you'd tell a new hire, one line, one meaning, no room to interpret

when the rule is clear the agent follows it, when it's vague the agent improvises, and improvising is exactly where your outputs drift sideways

6. delegate the heavy work to subagents

push the messy, expensive work down to subagents, especially anything that touches the browser

driving the browser from the main agent can burn through A LOT of context just on screenshots

hand that same job to a subagent and it comes back with "done, here's the summary" in a line or two

the main agent stays the conductor, the helpers do the reaching, and your main memory stays clean

it's the council idea again, one level down, you coordinate the work instead of doing all of it yourself

the whole operating system, in one block

the best model keeps getting gated, so one model can't be your foundation

hand the coordination to a council: thinker plans, worker builds, verifier signs off

the verifier is the stop condition, the work is done when it says so, not before

run the council twice, at the start to plan and at delivery to roast

i use Fugu to run it, $20 to try, not sponsored, honest about being early

then keep each agent sharp:

build your own skills, don't download a library

CLI-first, MCPs only when you truly need them

instructions file under 100 lines

clear context often, keep memory in files

few rules, each one plain

delegate the heavy work to subagents

everything else on top of that is pure optimization that you probably won't even notice unless you're a software engineer

this is how you build yourself an operating systems that performs, with or without Fable 5...

anyway, i'm building the best AI community in the world, how about you join me ?

weeklyaiops.com

How to Build a Frontier Agent OS: A Multi-Model Council Strategy

the slow, miserable back-and-forth

why a merge of AIs beat your favorite model

the three roles, and the one that gets skipped

hand the back-and-forth to a council

run the council at the start and the finish

the council is the engine, the setup keeps it sharp

1. build your own skills, don't download a library

2. default to small CLIs, not MCP servers

3. keep the instructions file light

4. clean the context, keep the memory in files

5. write few rules, each one plain

6. delegate the heavy work to subagents

the whole operating system, in one block

Turn one viral article into a full content workflow

最近のバイラル記事

Match Review: Japan 1-2 Brazil "Yin and Yang" by M. Sinan Pala

How To Master Fable (Fundamentals Guide)

How I Invoiced $225,000 USD My First Year Selling Fortnite Skins

50 Toxic Behaviors Nobody Admits Actually Work

大いなる低下

5 Essential Slack Rules I Beg You to Follow (Seriously)