0. TL;DR
This article stems from six months of deep usage of Claude Code and lessons learned from spending $40/month across two accounts. I hope to provide some valuable input for everyone.
Initially, I used it as a ChatBot, but soon realized things were going wrong: context became messy, tools increased but effectiveness decreased, and rules were ignored despite being longer. After researching Claude Code itself, I realized it wasn't a Prompt issue, but a system design issue.
I want to discuss: how Claude Code works under the hood, why context gets messy and how to govern it, how to design Skills and Hooks, the correct use of Subagents, the architectural impact of Prompt Caching, and how to write a truly useful CLAUDE.md.
The most direct way to understand it is to break Claude Code into six layers:

Strengthening only one layer causes imbalance. If CLAUDE.md is too long, it pollutes the context; too many tools cause confusion; too many subagents lead to state drift; skipping verification makes it impossible to find where things failed.
1. How it works under the hood

The core of Claude Code is not "answering," but a repeating agentic loop:
1Collect Context → Take Action → Verify Result → [Finish or Loop]2 ↑ ↓3 CLAUDE.md Hooks / Permissions / Sandbox4 Skills Tools / MCP5 Memory
I realized bottlenecks are rarely due to the model not being smart enough, but rather giving it the wrong context or having no way to judge if the output is correct or to roll back.
Five layers to focus on:

Looking at these layers makes troubleshooting easier. Unstable results? Check context loading order. Out of control automation? Check the control layer. Declining quality in long sessions? Intermediate products polluted the context; starting a new session is better than tweaking prompts.
2. Conceptual Boundaries: MCP / Plugin / Tools / Skills / Hooks / Subagents

Simple rule: Use Tools/MCP for new actions, Skills for workflows, Subagents for isolated environments, Hooks for mandatory constraints/auditing, and Plugins for cross-project distribution.
3. Context Engineering: The Most Important System Constraint
Many treat context as a "capacity issue," but the bottleneck is usually noise. Useful information gets buried in irrelevant content.
Real Context Cost Composition

Claude Code's 200K context isn't fully available:
1200K Total Context2├── Fixed Overhead (~15-20K)3│ ├── System instructions: ~2K4│ ├── Skill descriptors: ~1-5K5│ ├── MCP Server tool definitions: ~10-20K ← The biggest hidden killer6│ └── LSP state: ~2-5K7│8├── Semi-fixed (~5-10K)9│ ├── CLAUDE.md: ~2-5K10│ └── Memory: ~1-2K11│12└── Dynamic Available (~160-180K)13 ├── Chat history14 ├── File content15 └── Tool results

A typical MCP Server (like GitHub) contains 20-30 tool definitions, each ~200 tokens, totaling 4,000-6,000 tokens. Connecting 5 servers consumes 25,000 tokens (12.5%). This is critical when reading large amounts of code.
Recommended Context Layering
1Always Resident → CLAUDE.md: Project contract / build commands / prohibitions2Path-based → rules: Language / directory / file-type specific rules3On-demand → Skills: Workflows / domain knowledge4Isolated → Subagents: Large-scale exploration / parallel research5Outside Context → Hooks: Deterministic scripts / auditing / blocking
Don't load things you only use occasionally.
Context Best Practices
- Keep CLAUDE.md short, hard, and executable. Anthropic's own is ~2.5K tokens.
- Move large reference docs to Skills' supporting files.
- Use .claude/rules/ for path/language rules.
- Use /context to monitor consumption.

- Use /clear for task switching and /compact for new phases.
- Write Compact Instructions in CLAUDE.md to control what is preserved.
Tool Output Noise: Another Hidden Killer
Dynamic tool output (like cargo test or git log) can easily fill the context. Claude doesn't need to see everything.
RTK (Rust Token Killer) is a good approach: it filters command output before it reaches Claude. For example, it can condense thousands of lines of test output into a single success message.
Compression Traps
Default compression might delete architecture decisions and constraints.

Solution: Specify Compact Instructions in CLAUDE.md to prioritize architecture decisions, modified files, verification status, and TODOs.
Another proactive solution: Have Claude write a HANDOFF.md before starting a new session to explain progress and dead ends.
Engineering Value of Plan Mode

Plan Mode separates exploration from execution.

For complex refactoring, this is better than rushing to code. Advanced tip: Use one Claude to write the plan and another as a "Senior Engineer" to review it.
4. Skills Design: Workflows Loaded On-Demand
Skills are on-demand knowledge and workflows.
What makes a good Skill
- Description should say "when to use me," not "what I do."
- Have complete steps, inputs, outputs, and stop conditions.
- Keep the main body for navigation and core constraints; move details to supporting files.
- Set disable-model-invocation: true for skills with side effects.
Progressive Disclosure
Claude Code emphasizes "progressive disclosure": first provide indices and navigation, then pull details as needed.
Three Typical Skill Types
- Checklist (Quality Gate): e.g., release-check.
- Workflow (Standardized Ops): e.g., config-migration with rollback.
- Domain Expert (Decision Framework): e.g., runtime-diagnosis.
Keep descriptors short to save context space.
5. Tool Design: Helping Claude Choose Correctly
Tools for agents should focus on ease of correct use rather than feature completeness.
Good vs. Bad Tools

Design principles: Use prefixes (github_pr_*), support concise formats, provide helpful error messages, and avoid exposing too many fragmented tools.
Evolution of Internal Tools

The evolution of the "AskUserQuestion" tool shows that a dedicated tool is more stable than markdown formatting or exit parameters.


Todo tools became a "shackle" as models got stronger. Search tools evolved from RAG to Grep for better flexibility and "progressive disclosure."
6. Hooks: Mandatory Logic Before/After Operations
Hooks reclaim deterministic control over processes like formatting, file protection, and notifications.

Suitable for Hooks
Blocking protected files, auto-formatting after edits, injecting dynamic context (Git branch), and notifications.
Early Error Detection

7. Subagents: Independent Claude Instances
Subagents provide isolation. Tasks like scanning repos or running tests produce massive output that shouldn't clutter the main thread.
Explicit Constraints
Limit tools, choose the right model (Haiku for exploration, Opus for review), and set maxTurns.
8. Prompt Caching: The Core of Claude Code Architecture
Claude Code is built around Prompt Caching. High hit rates save money and increase rate limits.
Prompt Layout for Caching

Order matters for prefix matching: System Prompt → Tool Definitions → Chat History → User Input.
Don't Switch Models Mid-Session
Switching models breaks the cache. Use Subagents for handoffs instead.
Compaction Implementation

Compaction uses a fork to summarize history at 1/10th the cost due to cache hits.
9. Verification Loops: No Verifier, No Engineering Agent
"Claude says it's done" is useless without verification. Define verification explicitly in Prompt, Skill, and CLAUDE.md.
10. High-Frequency Commands
Commands like /context, /clear, /compact, and /memory help manage context actively.
Governance and Parallelism

Useful hidden commands: /simplify (code review), /rewind (checkpointing), /btw (side questions), /insight (analyze session for CLAUDE.md updates).
11. How to Write a Good CLAUDE.md
It's a contract, not a knowledge base.

Include build/test commands, architecture boundaries, coding conventions, safety rails, and Compact Instructions. Ask Claude to update CLAUDE.md after correcting its mistakes.
12. Recent Experiences
Lessons from building Kaku (Rust + Lua): environment transparency is vital (use a 'doctor' command), and Hooks are great for multi-language projects.
13. Anti-patterns

14. Health Check
Use npx skills add tw93/claude-health to check your configuration.
15. Conclusion

The focus shifts from "how to use features" to "how to let the agent run under constraints." If you can't define "done," the task isn't ready for an agent.





