Anthropic pays their engineers $80,000 a month. Those engineers now merge 8x more code per day than they did a year ago. Not because the model got smarter. Because they stopped prompting Claude and started building systems that prompt it for them.
Brian Cherny, head of Claude Code at Anthropic: "I don't prompt Claude anymore. I have loops running that prompt Claude and figure out what to do. My job is to write loops."
Most developers read that and have no idea what it means in practice. By the end of this article you will - and you'll have everything you need to build your first one this weekend.
What a loop actually is
A prompt is a single instruction. A loop is a goal the AI keeps working toward until it gets there.
The difference is who does the driving. With a prompt you push Claude through every step manually. With a loop you define the goal once and the system runs the full cycle on its own - finding the work, executing it, checking the result and deciding what comes next. All without you in the chair.
1Prompt | you type, Claude answers, you type again2Loop | system finds work, Claude executes,3 | system checks result, repeats until done
Every real loop has five stages:
1Discover | find what needs doing2Plan | decide how to do it3Execute | do the work4Verify | check against the goal5Iterate | not done? feed result back in and go again
Three of these do all the real work.
Verify is the heart. Without a real check on the result you don't have a loop - you have the agent agreeing with itself on repeat. The check has to be objective. A test that passes or fails. A build that compiles or doesn't. A linter that returns zero or non-zero. Not a second agent asked to "review." Two optimists agreeing is not verification.
State is what makes the loop learn. Each pass the AI has to remember what it already tried or it repeats the same mistake forever. A real loop keeps a record outside the conversation - what is done, what failed, what is next. The agent forgets between sessions. The file doesn't.
A stop condition is what keeps it sane. A loop with no exit runs until it succeeds, breaks or drains your budget. Every serious loop has two ways to stop - success, and a hard limit. Without this you've built a machine that bills you in silence.
The 4-condition test before you build anything
A loop earns its cost only when all four of these are true. Miss one and keep it as a manual prompt.
1Condition 1 | the task repeats at least weekly2 | less than weekly - setup cost never pays back34Condition 2 | something can automatically reject bad output5 | a test, build, linter, type check6 | no automated gate - you're back reviewing every diff78Condition 3 | the agent can do the work end to end9 | not hand half of it back to you mid-task1011Condition 4 | done is objective, not a judgment call12 | if quality is a matter of taste, a human still wins
Good first loops:
1CI failure triage | nightly, classify failures, draft fixes2Dependency bumps | weekly, scan updates, open PRs3Lint-and-fix passes | on every PR, apply style fixes automatically4Issue-to-PR drafts | on codebases with strong test coverage
Bad first loops - keep a human in the chair:
1Architecture rewrites | judgment call, loop will drift2Auth or payments code | irreversible mistakes too expensive3Production deploys | needs human approval gate4Vague product work | done is not objective
The five building blocks
Every working loop is assembled from five parts. Claude Code ships all five.
1 - The automation (the heartbeat)
This is what makes a loop an actual loop and not just a run you did once. A trigger fires on a schedule or event, hands the task to Claude and brings findings back without you going around checking.
1/loop | re-runs a prompt on a cadence2/goal | keeps going until a condition you wrote is true3hooks | fire commands at points in the agent lifecycle4cron | runs after you close the laptop via GitHub Actions
The most important is /goal. A separate small model checks whether the condition is met - so the agent that wrote the code is not the one grading it.
1/loop 30m /goal "All tests in test/auth pass and lint is clean."2Scan src/auth for failures, draft fixes, open PR when goal holds.
2 - Skills (write project knowledge once)
A skill is how you stop re-explaining the same project context every session. A SKILL.md file holds the instructions, conventions and rules the loop reads on every run.
Without skills the loop re-derives your entire project from zero every cycle. With skills intent compounds - the conventions, the build steps, the things that went wrong before - written once, read forever.
1skills/2 ci-triage.md - classification rules, fix patterns, escalation criteria3 frontend.md - Tailwind, Next.js, SEO, accessibility rules4 security.md - auth patterns, input validation, never-touch list5 deps.md - update rules, compatibility checks, rollback patterns
The automation calls the skill by name. The recurring job stays maintainable instead of rotting inside a schedule nobody updates.
3 - Sub-agents (keep the maker away from the checker)
The single most useful structural thing in any loop. Split the agent that writes from the agent that checks.
The model that wrote the code is too generous grading its own work. A second agent with different instructions - and sometimes a stronger model - catches what the first one talked itself into.
1Maker agent | fast cheap model - explores, implements, drafts2Checker agent | strong model, high effort - reviews, verifies, rejects
1.claude/agents/2 explorer.md | read-only, fast model, finds what needs doing3 implementer.md | writes the fix, runs the tests4 reviewer.md | security audit, code quality, objective gate
One model. Three specialists. The loop runs while you're not watching and the verifier is the only reason you can actually walk away.
4 - Connectors (so it acts, not suggests)
The difference between a loop that says "here is the fix" and a loop that opens the PR, links the ticket and pings Slack once CI is green.
1GitHub | read repos, create branches, open PRs, react to webhooks2Linear | update tickets, link PRs, close items when verification passes3Slack | post triage results, ping humans on escalations4Sentry | investigate live alerts, draft fixes for high-frequency ones
Without connectors the loop describes what it would do. With connectors it does it.
5 - The state file (the agent forgets. The file doesn't.)
The piece that sounds too dumb to matter and is the spine of every working loop. A markdown file outside the conversation that records what is done, what failed and what is next.
1# Loop state - ci-triage23## Last run42026-06-09 03:30 UTC - 7 failures classified, 3 fixes drafted, 4 escalated56## In progress7- claude/fix-auth-token-refresh - tests passing locally, awaiting CI89## Completed today10- claude/bump-axios-1.7.4 - merged, CI green1112## Escalated to humans13- src/billing/refund.ts - tests failing 3 ways, root cause unclear1415## Lessons learned16- 2026-06-08: PowerShell hits TLS issue on Windows runner. Use bash.17- 2026-06-07: tests/e2e/checkout requires Stripe webhook secret. Skip if missing.
Tomorrow's run reads this file and resumes. Without it every run restarts from zero.
What a real loop looks like
CI failure triage - one of the best first loops because verification is automatic and the stakes of bad output are low.
1Trigger | nightly at 3am or on CI failure event2Skill | ci-triage.md3State read | STATE.md - what failed yesterday, what was fixed
The loop runs:
1Step 1 | reads CI failures from GitHub via connector2Step 2 | classifies each: env / flake / bug / dependency / infra3Step 3 | for each fixable failure - opens worktree, sends implementer agent4Step 4 | reviewer agent checks fix against tests and security rules5Step 5 | gate passes - opens PR and links Linear ticket6Step 6 | gate fails - escalates to human triage inbox7Step 7 | updates STATE.md8Step 8 | posts overnight summary to Slack
In the morning you wake up to a Slack message, a triage inbox with only the hard problems and a set of PRs that already passed automated review. You didn't prompt a single step.
The loop that compounds across the whole business
The most powerful version is not one loop. It's multiple loops writing to a shared artifact system so they learn from each other.
At SuperDesign their loops cover support, SEO, product growth and ads. Each has its own trigger and workflow. But they all write to the same shared signal store.
The support loop notices five users asking how to export something. It creates a signal:
1kind: signal2title: Export is too hard to find3frequency: 54category: friction5tags: [feedback, pricing, conversion]
The SEO loop notices a page with strong traffic but poor conversion. Creates another signal. The product growth loop reads both signals alongside analytics and identifies that the export friction is bigger than the raw data suggested - because now two independent sources confirm the same problem. The ads loop finds a keyword gap and feeds it directly into the SEO loop.
1Support loop | finds friction, writes signal2SEO loop | finds content gap, writes signal3Product loop | reads both signals, identifies priority4Ads loop | finds keyword gap, feeds SEO loop
None of these loops are isolated automations. They operate from a shared knowledge base of what the business is learning. Each loop makes every other loop smarter. That's compounding.
The failures nobody warns you about
The Ralph Wiggum loop. An agent meant to signal completion only when finished signals it early. The loop exits on a half-done job and keeps running and billing while producing nothing. Fix: a hard gate that can objectively fail the work, not a soft "looks good to me."
Goal drift over long sessions. Constraints disappear by turn 47 as the context summarizes itself. Fix: a standing AGENTS.md that the agent rereads at the start of every run.
Comprehension debt. The faster the loop ships code you didn't write, the larger the gap between what exists and what you understand. The bill that hurts is not the token bill. It's the day you debug a system nobody on the team has read. Fix: read every diff the loop opens.
Cognitive surrender. The pull to stop forming an opinion and accept whatever the loop returns. Designing the loop is the cure when you do it with judgment and the accelerant when you do it to avoid thinking. Same action, opposite result.
Token costs that compound. Every loop iteration re-reads the full context and the pile grows each pass. Track cost per accepted change. Below 50% acceptance rate the loop costs more than it saves.
The build order that actually works
Everyone who ships loops that survive in production does it the same way:
1Step 1 | get ONE manual run reliable first2Step 2 | turn it into a skill - save the instructions in SKILL.md3Step 3 | wrap the skill in a loop - add the gate and stop condition4Step 4 | THEN put it on a schedule
Scheduling something you haven't made reliable by hand is exactly how loops blow up while you sleep. Prove it once. Harden it. Then automate it.
The metric that matters is not tokens spent or tasks attempted. It's cost per accepted change. If the loop gives you ten results and you reject six you're doing the review work it was supposed to save.
The shift that already happened
In 2024 the leverage was at the prompt. Better prompt, better output, better developer. In 2026 the leverage is one floor above - the system that decides what Claude works on, when, with what gate and what state survives between runs.
Anthropic engineers merge 8x more code per day not because they found a better way to ask Claude questions. Because they stopped asking questions and started building systems that ask Claude questions on their behalf - continuously, overnight, while they sleep.
The loop doesn't make the work easier. It moves where the work happens. From typing prompts to designing the system that types prompts. From holding the tool to building the factory.
Most developers will keep prompting by hand and wonder why the gap keeps widening. A few will spend one weekend building their first loop - one automation, one skill, one state file, one gate - and never go back.





