I Accidentally Spent 1.5 Million Yen on Claude: Essential Settings to Prevent an AI Billing Disaster

To prevent similar accidents from happening.

I would be happy if everyone could read this so that our 1.5 million yen feels like it had some meaning!!! 🥺

Do this for now

Set Claude Team organization monthly usage limits (obviously)
Set per-service limits for Claude Code Review
Change Claude Code Review triggers from "per push" to "once"
If you can stream credit card transaction logs to a specific channel, keep a casual eye on them (this is how I noticed this time)
Thoroughly implement guardrails and harnesses

Claude Team Organization Monthly Usage Limit Setting

From https://claude.ai/admin-settings/usage

gogotanaka / aisaac, inc. - inline image

In Enterprise, you can set even more granular limits than just monthly, so it's worth utilizing those.

Claude Code Review per-service limit

From https://claude.ai/admin-settings/usage

Change Claude Code Review trigger from "per push" to "once"

From https://claude.ai/admin-settings/usage

What happened

On a peaceful Saturday evening, a sense of unease ran through the organization.

1.5 million yen was being drained into Claude Code Review. 😇

Why it happened

To jump to the conclusion, the following was occurring:

Claude Code Review runs

↓

Review comments are added

↓

AI Agent (Codex/Claude, etc.) determines if a fix is necessary

↓

AI Agent fixes and commits/pushes if needed

↓

Claude Code Review runs again triggered by the push

↓

Rebase/force push propagates to subsequent Stacked PRs

↓

Claude Code Review runs on subsequent PRs as well

↓

Repeat to infinity ♾️

In the repository we are developing, we have introduced Claude Code Review, which automatically reviews code for GitHub PRs. Additionally, this time we were using AI agents to handle relatively large-scale changes by splitting them into multiple PRs and stacking them linearly from upstream to downstream. (A series of such PRs is called a Stacked PR). Furthermore, Claude Code Review is a pay-as-you-go service based on token usage.

Stacked PR

feat/branch-1 (PR 1)

↓

feat/branch-2 (PR 2)

↓

feat/branch-3 (PR 3)

↓

…

↓

feat/branch-N (PR N)

Looking back, the average cost of these reviews was $25.81 per review 😱

Even on the Anthropic official blog, it's explained that Code Review aims for deep reviews and will be more expensive than lightweight options like Claude Code GitHub Action, but I never imagined it would be this much...

This time, to make large-scale changes, we created multiple Stacked PRs while using multiple AI agents locally. Claude Code Review is executed at the timing these PRs are created. Normally, a human would check the review content and decide whether to address it, but at this time, assuming a human would do the final check, we had delegated the primary response—including the judgment of whether to address the review—to the AI agent.

Deep dive into the problem

1. Proceeding with complex features using multiple Stacked PRs

This task involved relatively large changes. We split it into multiple PRs because putting everything into one PR makes reviewing difficult and considering the release order. Splitting the PRs wasn't the mistake. The problem was that they were linear Stacked PRs.

In Stacked PRs, if you fix an upstream PR, the downstream PRs must incorporate those changes. In other words, pushing to the upstream causes rebase/push propagation to the downstream.

This structure was incompatible with the Claude Code Review setting that triggers on every push.

2. Leaving the review response entirely to the AI

Since reviews ran on every push and review comments were increasing, we assigned the following tasks to the AI agent:

Check unresolved review comments
Decide whether to address or skip them
If addressing, fix and pass local tests
Commit/push
Reply to review comments and resolve them
Monitor for additional reviews for a while after pushing

The original goal was to have the AI's response to suggestions completed by the time I performed the operation check and review.

If we had ensured that a human made the final decision on the response policy for review comments, we likely could have prevented this.

3. It kept running even after work hours

I was running and monitoring the above process even after work hours, but I should have stopped it at the very least when finishing work. This is a complete point of reflection.

Since I used the local AI agent on a subscription basis, the sense that API costs were increasing at that very moment was thin. On the other hand, Claude Code Review running on the GitHub side was consuming the Anthropic organization usage. On the Anthropic Console, the average cost for the target repository was displayed as $25.81/review. Underestimating this cost sensation was also one of the reflection points.

I created a situation where pay-as-you-go AI was executed for a long time while there was a gap between the perceived local cost and the actual billed cost.

What went wrong

1. Taking the "trigger on every push" setting for expensive reviews too lightly

This time, the Anthropic Console setting was for reviews to run on every push. While the feature to review on every push is convenient, suggestions can be frequent with every change, so triggers should be considered carefully.

2. Misjudging the compatibility between Stacked PRs and automated reviews

Stacked PRs are an effective way to split PRs into reviewable units. However, fixing an upstream PR requires a rebase of downstream PRs. And pushing to the downstream PR triggered a review there as well. What would have been one review for one PR propagated to N PRs in a Stacked PR, and reviews were executed for that amount.

3. Delegating judgment, fixing, and pushing to the AI

Using AI for organizing review comments or local fixes is very convenient. However, this time we gave it too much authority. The loop of seeing a review comment, addressing it, pushing, and monitoring again should have been operated with explicit human confirmation.

4. Making the organization limit the last line of defense

As a result, it reached near the organization limit, and that's where we noticed the anomaly. Having a limit itself was good. However, $10,000 was too high for a last line of defense. Also, due to the influence of enabled additional usage and reflection timing, the cumulative monthly organization cost reached over $10,000 in almost a single day. We needed guardrails that would stop much earlier.

Summary

I melted 1.5 million yen in one day with Claude Code Review. I am currently sending a refund request. The cause was that while Claude Code Review was set to trigger on every push, the combination of AI agent fixes/pushes and the rebase chain of Stacked PRs created a loop of reviews and fixes.

This time, we took too much of the convenience of AI-driven development and neglected safety and cost guardrails. Until now, it was a phase of just letting us use AI, so various agents were provided relatively cheaply, but I think we are entering a phase where they will properly take money from us as a business now that we know the benefits.

We are now reinventing the development organization for the AI era. https://supateam.com/ We will make sure to utilize this experience.

Do this for now

What happened

1.5 million yen was being drained into Claude Code Review. 😇

Why it happened

Deep dive into the problem

1. Proceeding with complex features using multiple Stacked PRs

2. Leaving the review response entirely to the AI

3. It kept running even after work hours

What went wrong

1. Taking the "trigger on every push" setting for expensive reviews too lightly

2. Misjudging the compatibility between Stacked PRs and automated reviews

3. Delegating judgment, fixing, and pushing to the AI

4. Making the organization limit the last line of defense

Summary

Recent viral articles

16 Ways to Make Your Partnership Last: Tips for Re-evaluating Married Life

Why Being a "Good Child" Keeps You Away from Success

Meet the EASY Residency Season 3 Cohort

Built for creators.