4-Layer AI Agent Memory Architecture for Production Workflows

TLDR: Most agents reset and forget. Here's the exact memory stack I've built so my agents share context across restarts, coordinate without stepping on each other, and retain decisions for months.

The problem nobody solves

Here's the typical AI agent memory setup. Stuff important things in the system prompt. Hope the context window doesn't run out. Restart and lose everything.

I've run two coordinated agents (Ella on OpenClaw, Lyra on Hermes) across hundreds of sessions over 6 months. The biggest thing that makes them useful isn't the models or the tools. It's the memory architecture.

When Lyra ships a fix at 2am, Ella knows about it in the morning. When I decided in January how secrets should be stored, both agents still follow that decision in July. When a session crashes mid-task, the next one picks up exactly where it left off.

Here's the exact 4-layer system.

Layer 1: In-session context

Every session starts by reading two files. The identity file is the agent's standing identity. Not a system prompt buried in config. An actual markdown file I can edit. It holds how the agent should behave, what it prioritizes, what it never does without asking, and its relationship to the other agents.

The memory index file is the index of everything worth remembering across sessions. Not a vector database. Not embeddings. A plain table of contents pointing to individual memory files. Each memory file is a short note with a name, a description, a type, and a brief body. The index is always loaded. Individual files get read on demand when relevant.

Why markdown? Because I can read it, edit it, and debug it. When an agent starts acting wrong, I open the index and find the bad instruction. When I want to change a behavior, I edit the file. No API. No dashboard. No retraining.

Layer 2: Post-session retention (Hindsight)

The problem with markdown-only memory: it only captures what someone explicitly writes. Most valuable context is implicit. Decisions made during a run. Facts inferred from a task. Things that turned out to matter.

Hindsight is a local facts-retention backend running on localhost. At the end of every meaningful session, the agent automatically pushes a curated set of retained facts into a named bank. Each agent has its own bank.

What gets retained: decisions made during the session, non-obvious facts about the user or project, failure patterns and the fixes we adopted, and preferences the user confirmed or corrected.

When a new session starts, Hindsight is queried for relevant context before the agent answers. It's not full-text search over transcripts. It's curated facts, tagged by type, that the agent has learned are worth carrying forward.

The promoted path: Hindsight fact, human review, memory index entry. Automatic retention with a human approval gate.

Layer 3: Shared long-term state (Nexus)

Single-agent memory breaks down when you add a second agent. They drift. One thinks X is the current project status. The other thinks Y. Within a week they're contradicting each other.

The fix is a shared, inspectable state file that both agents read and write. We use an Obsidian vault I call Nexus. It holds a live-context log both agents append to after every meaningful turn, a project-state file, a decisions log, and a per-agent working-context checkpoint updated every few tool calls during long tasks.

The live-context file is the real-time handshake. The invariant: before every reply, read it. After every meaningful turn, append to it.

When Lyra finishes a PR at 2am and Ella is answering my morning question, Ella already knows. She read the log. No message passing. No inter-agent API. No polling. One shared file, two agents, append-only log.

Layer 4: Searchable knowledge (gbrain)

The first three layers handle episodic memory. What happened, what was decided, what matters to carry forward. gbrain is the semantic layer. It's a compiled wiki running as an MCP server over the Nexus vault. Full-text and semantic search over everything that's been written down.

When an agent needs to answer a research question, find a prior synthesis, or look up how we've handled a class of problem before, it queries gbrain instead of re-reading every file. Output is a ranked list of relevant pages with provenance. The agent reads what's relevant. It doesn't dump the whole vault into context.

This is the difference between memory and recall. Layers 1 to 3 handle what the agent carries. Layer 4 handles what the agent can look up.

The cross-agent sync invariant

Two agents, one live-context file. The risk: they overwrite each other, or miss each other's entries. The invariant we run: each entry is signed with the agent name, channel, kind, and a one-line summary. Append-only. Never edit someone else's entry. If Lyra has logged something relevant, Ella acknowledges it explicitly in the next reply. For significant decisions, both agents also write to the decisions log with a timestamp and rationale.

This has run across hundreds of sessions. We've had one conflict: a race condition where both agents appended within the same minute during a handoff. Resolution: read both entries, reconcile in the next turn. No automated merge needed.

What this replaces

Before this architecture: five disconnected chat sessions, each with its own stale context. Agents contradicting each other because neither could see what the other knew. Instructions I gave three weeks ago, forgotten. Decisions that lived in my head instead of a file.

After: two agents that brief themselves before every reply. A shared state log that neither can deny. Retained decisions that survive months of context resets. Every behavior preference in a file I can edit and verify.

The honest tradeoff: this system requires discipline. You have to write things down. You have to maintain the files. You have to review what gets retained before it becomes permanent. It is not a magic always-on system. It is structured manual discipline plus automation at the seams.

How to start

You don't need two agents or a full vault to run version one of this.

Step 1: An identity file plus a memory index. Create them. Read them at session start. Write every behavior preference into the index the second time you correct the agent for the same thing.

Step 2: One shared state file. If you run more than one agent, or use Claude across multiple windows, create a live-context file. Each session appends to it at the end and reads it at the start.

Step 3: A retention rule. When a session produces a decision that should survive, write it to the index manually. Do this by hand until you trust the pattern. Then automate the flagging.

Step 4: File per fact, not one big doc. The index points to individual files. This makes it easy to delete a stale memory without touching others.

The full 4-layer stack took about 6 months to stabilize. Layers 1 and 3 took one weekend. Start there.

Takeaway

Most agent memory setups are context-window management with extra steps. They work until the window resets or you add a second agent.

Durable agent memory is an infrastructure problem, not a prompting problem. The answer is multiple layers with different time horizons: in-session context, cross-session facts, shared state, searchable knowledge.

All of ours is plain markdown. No vector database. No embeddings. No retraining. Just files I can open, edit, and debug.

The agents that are genuinely useful aren't the ones with the biggest context window. They're the ones that remember what matters and forget what doesn't.

If you're building agents you want to trust with real work, start with the memory architecture before you add more tools.

Tools referenced

Hindsight (local memory retention): https://github.com/vectorize-io/hindsight

gbrain (compiled wiki / semantic search): https://github.com/garrytan/gbrain

OpenClaw (agent runtime): https://openclaw.ai

Hermes (agent runtime): hermes-agent.nousresearch.com

The 4-layer memory architecture I run across 2 AI agents in production

The problem nobody solves

Layer 1: In-session context

Layer 2: Post-session retention (Hindsight)

Layer 3: Shared long-term state (Nexus)

Layer 4: Searchable knowledge (gbrain)

The cross-agent sync invariant

What this replaces

How to start

Takeaway

Tools referenced

Use YouMind to read viral articles deeply

Artículos virales recientes

X Platform Opens the $X Token Presale and Introduces X Wallet

The Grid is Awake: OriginTrail June 2026 Recap

How a Single NotebookLM 'Self-Note' Started Making My Dreams Come True

A Primer on Gold, Silver, and Platinum Group Metals

MercadoLibre looks broken. The business is on fire.

8 Essential Strategies for Business Professionals Using Fable 5: Prompts and Guide

The 4-layer memory architecture I run across 2 AI agents in production

The problem nobody solves

Layer 1: In-session context

Layer 2: Post-session retention (Hindsight)

Layer 3: Shared long-term state (Nexus)

Layer 4: Searchable knowledge (gbrain)

The cross-agent sync invariant

What this replaces

How to start

Takeaway

Tools referenced

Use YouMind to read viral articles deeply

Convierte tu Markdown en un artículo de 𝕏 impecable

Artículos virales recientes

X Platform Opens the $X Token Presale and Introduces X Wallet

The Grid is Awake: OriginTrail June 2026 Recap

How a Single NotebookLM 'Self-Note' Started Making My Dreams Come True

A Primer on Gold, Silver, and Platinum Group Metals

MercadoLibre looks broken. The business is on fire.

8 Essential Strategies for Business Professionals Using Fable 5: Prompts and Guide