Anthropic Engineers Use AI Loops to 8x Their Daily Code Output

@0xCortexl
INGLÊShá 2 dias · 03/07/2026
202K
102
20
3
308

TL;DR

Anthropic engineers have achieved an 8x increase in productivity by replacing manual prompts with automated loops. This article explains the five-stage loop architecture—Discover, Plan, Execute, Verify, and Iterate.

Anthropic pays their engineers $80,000 a month. Those engineers now merge 8x more code per day than they did a year ago. Not because the model got smarter. Because they stopped prompting Claude and started building systems that prompt it for them.

Brian Cherny, head of Claude Code at Anthropic: "I don't prompt Claude anymore. I have loops running that prompt Claude and figure out what to do. My job is to write loops."

Most developers read that and have no idea what it means in practice. By the end of this article you will - and you'll have everything you need to build your first one this weekend.

What a loop actually is

A prompt is a single instruction. A loop is a goal the AI keeps working toward until it gets there.

The difference is who does the driving. With a prompt you push Claude through every step manually. With a loop you define the goal once and the system runs the full cycle on its own - finding the work, executing it, checking the result and deciding what comes next. All without you in the chair.

text
1Prompt | you type, Claude answers, you type again
2Loop | system finds work, Claude executes,
3 | system checks result, repeats until done

Every real loop has five stages:

text
1Discover | find what needs doing
2Plan | decide how to do it
3Execute | do the work
4Verify | check against the goal
5Iterate | not done? feed result back in and go again

Three of these do all the real work.

Verify is the heart. Without a real check on the result you don't have a loop - you have the agent agreeing with itself on repeat. The check has to be objective. A test that passes or fails. A build that compiles or doesn't. A linter that returns zero or non-zero. Not a second agent asked to "review." Two optimists agreeing is not verification.

State is what makes the loop learn. Each pass the AI has to remember what it already tried or it repeats the same mistake forever. A real loop keeps a record outside the conversation - what is done, what failed, what is next. The agent forgets between sessions. The file doesn't.

A stop condition is what keeps it sane. A loop with no exit runs until it succeeds, breaks or drains your budget. Every serious loop has two ways to stop - success, and a hard limit. Without this you've built a machine that bills you in silence.

The 4-condition test before you build anything

A loop earns its cost only when all four of these are true. Miss one and keep it as a manual prompt.

text
1Condition 1 | the task repeats at least weekly
2 | less than weekly - setup cost never pays back
3
4Condition 2 | something can automatically reject bad output
5 | a test, build, linter, type check
6 | no automated gate - you're back reviewing every diff
7
8Condition 3 | the agent can do the work end to end
9 | not hand half of it back to you mid-task
10
11Condition 4 | done is objective, not a judgment call
12 | if quality is a matter of taste, a human still wins

Good first loops:

text
1CI failure triage | nightly, classify failures, draft fixes
2Dependency bumps | weekly, scan updates, open PRs
3Lint-and-fix passes | on every PR, apply style fixes automatically
4Issue-to-PR drafts | on codebases with strong test coverage

Bad first loops - keep a human in the chair:

text
1Architecture rewrites | judgment call, loop will drift
2Auth or payments code | irreversible mistakes too expensive
3Production deploys | needs human approval gate
4Vague product work | done is not objective

The five building blocks

Every working loop is assembled from five parts. Claude Code ships all five.

1 - The automation (the heartbeat)

This is what makes a loop an actual loop and not just a run you did once. A trigger fires on a schedule or event, hands the task to Claude and brings findings back without you going around checking.

text
1/loop | re-runs a prompt on a cadence
2/goal | keeps going until a condition you wrote is true
3hooks | fire commands at points in the agent lifecycle
4cron | runs after you close the laptop via GitHub Actions

The most important is /goal. A separate small model checks whether the condition is met - so the agent that wrote the code is not the one grading it.

bash
1/loop 30m /goal "All tests in test/auth pass and lint is clean."
2Scan src/auth for failures, draft fixes, open PR when goal holds.

2 - Skills (write project knowledge once)

A skill is how you stop re-explaining the same project context every session. A SKILL.md file holds the instructions, conventions and rules the loop reads on every run.

Without skills the loop re-derives your entire project from zero every cycle. With skills intent compounds - the conventions, the build steps, the things that went wrong before - written once, read forever.

text
1skills/
2 ci-triage.md - classification rules, fix patterns, escalation criteria
3 frontend.md - Tailwind, Next.js, SEO, accessibility rules
4 security.md - auth patterns, input validation, never-touch list
5 deps.md - update rules, compatibility checks, rollback patterns

The automation calls the skill by name. The recurring job stays maintainable instead of rotting inside a schedule nobody updates.

3 - Sub-agents (keep the maker away from the checker)

The single most useful structural thing in any loop. Split the agent that writes from the agent that checks.

The model that wrote the code is too generous grading its own work. A second agent with different instructions - and sometimes a stronger model - catches what the first one talked itself into.

text
1Maker agent | fast cheap model - explores, implements, drafts
2Checker agent | strong model, high effort - reviews, verifies, rejects
text
1.claude/agents/
2 explorer.md | read-only, fast model, finds what needs doing
3 implementer.md | writes the fix, runs the tests
4 reviewer.md | security audit, code quality, objective gate

One model. Three specialists. The loop runs while you're not watching and the verifier is the only reason you can actually walk away.

4 - Connectors (so it acts, not suggests)

The difference between a loop that says "here is the fix" and a loop that opens the PR, links the ticket and pings Slack once CI is green.

text
1GitHub | read repos, create branches, open PRs, react to webhooks
2Linear | update tickets, link PRs, close items when verification passes
3Slack | post triage results, ping humans on escalations
4Sentry | investigate live alerts, draft fixes for high-frequency ones

Without connectors the loop describes what it would do. With connectors it does it.

5 - The state file (the agent forgets. The file doesn't.)

The piece that sounds too dumb to matter and is the spine of every working loop. A markdown file outside the conversation that records what is done, what failed and what is next.

text
1# Loop state - ci-triage
2
3## Last run
42026-06-09 03:30 UTC - 7 failures classified, 3 fixes drafted, 4 escalated
5
6## In progress
7- claude/fix-auth-token-refresh - tests passing locally, awaiting CI
8
9## Completed today
10- claude/bump-axios-1.7.4 - merged, CI green
11
12## Escalated to humans
13- src/billing/refund.ts - tests failing 3 ways, root cause unclear
14
15## Lessons learned
16- 2026-06-08: PowerShell hits TLS issue on Windows runner. Use bash.
17- 2026-06-07: tests/e2e/checkout requires Stripe webhook secret. Skip if missing.

Tomorrow's run reads this file and resumes. Without it every run restarts from zero.

What a real loop looks like

CI failure triage - one of the best first loops because verification is automatic and the stakes of bad output are low.

text
1Trigger | nightly at 3am or on CI failure event
2Skill | ci-triage.md
3State read | STATE.md - what failed yesterday, what was fixed

The loop runs:

text
1Step 1 | reads CI failures from GitHub via connector
2Step 2 | classifies each: env / flake / bug / dependency / infra
3Step 3 | for each fixable failure - opens worktree, sends implementer agent
4Step 4 | reviewer agent checks fix against tests and security rules
5Step 5 | gate passes - opens PR and links Linear ticket
6Step 6 | gate fails - escalates to human triage inbox
7Step 7 | updates STATE.md
8Step 8 | posts overnight summary to Slack

In the morning you wake up to a Slack message, a triage inbox with only the hard problems and a set of PRs that already passed automated review. You didn't prompt a single step.

The loop that compounds across the whole business

The most powerful version is not one loop. It's multiple loops writing to a shared artifact system so they learn from each other.

At SuperDesign their loops cover support, SEO, product growth and ads. Each has its own trigger and workflow. But they all write to the same shared signal store.

The support loop notices five users asking how to export something. It creates a signal:

markdown
1kind: signal
2title: Export is too hard to find
3frequency: 5
4category: friction
5tags: [feedback, pricing, conversion]

The SEO loop notices a page with strong traffic but poor conversion. Creates another signal. The product growth loop reads both signals alongside analytics and identifies that the export friction is bigger than the raw data suggested - because now two independent sources confirm the same problem. The ads loop finds a keyword gap and feeds it directly into the SEO loop.

text
1Support loop | finds friction, writes signal
2SEO loop | finds content gap, writes signal
3Product loop | reads both signals, identifies priority
4Ads loop | finds keyword gap, feeds SEO loop

None of these loops are isolated automations. They operate from a shared knowledge base of what the business is learning. Each loop makes every other loop smarter. That's compounding.

The failures nobody warns you about

The Ralph Wiggum loop. An agent meant to signal completion only when finished signals it early. The loop exits on a half-done job and keeps running and billing while producing nothing. Fix: a hard gate that can objectively fail the work, not a soft "looks good to me."

Goal drift over long sessions. Constraints disappear by turn 47 as the context summarizes itself. Fix: a standing AGENTS.md that the agent rereads at the start of every run.

Comprehension debt. The faster the loop ships code you didn't write, the larger the gap between what exists and what you understand. The bill that hurts is not the token bill. It's the day you debug a system nobody on the team has read. Fix: read every diff the loop opens.

Cognitive surrender. The pull to stop forming an opinion and accept whatever the loop returns. Designing the loop is the cure when you do it with judgment and the accelerant when you do it to avoid thinking. Same action, opposite result.

Token costs that compound. Every loop iteration re-reads the full context and the pile grows each pass. Track cost per accepted change. Below 50% acceptance rate the loop costs more than it saves.

The build order that actually works

Everyone who ships loops that survive in production does it the same way:

text
1Step 1 | get ONE manual run reliable first
2Step 2 | turn it into a skill - save the instructions in SKILL.md
3Step 3 | wrap the skill in a loop - add the gate and stop condition
4Step 4 | THEN put it on a schedule

Scheduling something you haven't made reliable by hand is exactly how loops blow up while you sleep. Prove it once. Harden it. Then automate it.

The metric that matters is not tokens spent or tasks attempted. It's cost per accepted change. If the loop gives you ten results and you reject six you're doing the review work it was supposed to save.

The shift that already happened

In 2024 the leverage was at the prompt. Better prompt, better output, better developer. In 2026 the leverage is one floor above - the system that decides what Claude works on, when, with what gate and what state survives between runs.

Anthropic engineers merge 8x more code per day not because they found a better way to ask Claude questions. Because they stopped asking questions and started building systems that ask Claude questions on their behalf - continuously, overnight, while they sleep.

The loop doesn't make the work easier. It moves where the work happens. From typing prompts to designing the system that types prompts. From holding the tool to building the factory.

Most developers will keep prompting by hand and wonder why the gap keeps widening. A few will spend one weekend building their first loop - one automation, one skill, one state file, one gate - and never go back.

Save to YouMind

Use YouMind to read viral articles deeply

Save the source, ask focused questions, summarize the argument, and turn a viral article into reusable notes in one AI workspace.

Explore YouMind
Para criadores

Transforme o seu Markdown num artigo 𝕏 impecável

Quando publica os seus próprios textos longos, formatar imagens, tabelas e blocos de código para o 𝕏 é uma dor de cabeça. O YouMind transforma um rascunho completo em Markdown num artigo 𝕏 impecável e pronto a publicar.

Experimente Markdown para 𝕏

Mais padrões para decifrar

Artigos virais recentes

Explorar mais artigos virais