My Agent Remembers Better Than I Do (Part 2)
Two weeks ago, I wrote about teaching my OpenClaw agent to never repeat a mistake using error logs, QMD search, and heartbeat maintenance. It worked.
Then I migrated to a fresh machine and watched everything fall apart.
The data survived; the files were backed up. What broke was subtler. The agent woke up on the new instance with all its memories intact but no idea how to use them well. Retrieval was noisy, the daily logs were bloated with duplicates, there was no safety net before context compaction, and after just 18 days the workspace had already grown into an unstructured mess.
So I rebuilt the memory system from scratch. The files were fine; what I changed was the architecture around them.
This is what I learned.
Architecture Overview
This is the full picture of what the memory system looks like now:
The Problem With "Just Write It Down"
Part 1 was built on a simple principle: files are your persistence layer. Write everything down, search it later.
That principle is correct, but it doesn't scale without structure. After two weeks of aggressive memory writes, a few things had gone wrong.
Daily logs hit 400+ lines. My checkpoint cron fired every 15 minutes and appended whatever seemed important. The catch was that it couldn't see what the previous checkpoint had already written, so the same information appeared 5-6 times per day. Feb 17's log had the same health data repeated six times with identical numbers.
Search returned noise. When everything is saved, everything matches. A query for "API deployment" returned 12 results across 8 files, half of them stale daily log entries saying the same thing. The results were relevant enough; the problem was that they all repeated each other.
Context compaction killed memory. OpenClaw compresses old context when the window fills up, which is fine for conversation flow. But any unsaved context, the stuff the agent "knew" but hadn't written to disk yet, vanished. One compaction at the wrong time and the agent forgot what it was doing mid-task.
The foundation from Part 1 was right. I just needed three more things on top of it: smarter retrieval, compaction safety, and workspace governance.
Hybrid Search with MMR and Temporal Decay
OpenClaw's native memory_search supports more than basic vector search. After digging through the config schema, I found three features that changed retrieval quality overnight.
Hybrid Search (BM25 + Vector)
Pure vector search finds conceptually similar content, and pure keyword search (BM25) finds exact matches. Neither is sufficient on its own, so hybrid search combines the two.
{
"memory": {
"search": {
"hybrid": {
"enabled": true,
"vectorWeight": 0.7,
"textWeight": 0.3
}
}
}
}70% vector, 30% BM25. The vector component catches conceptual matches ("data pipeline architecture" finds results about "ingestion flow"). The BM25 component catches exact terms (like a specific deployment ID that vectors would miss). Together, recall went up noticeably.
MMR (Maximal Marginal Relevance)
This was the real win. Standard search returns the most relevant results, but "most relevant" often means "most similar to each other" — you get five results that all say the same thing from different daily logs.
MMR re-ranks results to balance relevance against diversity. Each subsequent result is penalized if it's too similar to results already selected.
{
"memory": {
"search": {
"mmr": {
"enabled": true,
"lambda": 0.7,
"candidateMultiplier": 4
}
}
}
}Lambda 0.7 means 70% relevance, 30% diversity. The candidateMultiplier fetches 4x more candidates than needed, then MMR selects the final set. The result is that for the same query, instead of five near-identical daily log entries, I get one daily log entry, one thematic file, one project doc, and one error log entry — much broader coverage.
Temporal Decay
Recent memories should rank higher than old ones. A lesson learned yesterday is more likely to be relevant than one from three weeks ago.
{
"memory": {
"search": {
"temporalDecay": {
"enabled": true,
"halfLifeDays": 30,
"reference": "now"
}
}
}
}A 30-day half-life means a document from 30 days ago gets its score halved, and one from 60 days ago gets quartered. This surfaces fresh context without me having to manually prune old results, and combined with MMR it gives the agent results that are recent, relevant, and varied.
The combined config:
{
"memory": {
"search": {
"provider": "local",
"sources": ["memory", "sessions"],
"hybrid": {
"enabled": true,
"vectorWeight": 0.7,
"textWeight": 0.3
},
"mmr": {
"enabled": true,
"lambda": 0.7,
"candidateMultiplier": 4
},
"temporalDecay": {
"enabled": true,
"halfLifeDays": 30
},
"maxResults": 8,
"minScore": 0.3
}
}
}It's all local. The embedding model (embeddinggemma-300m, 313MB GGUF) runs on Apple Silicon's Metal GPU, so there's no API cost, and searches complete in under 2 seconds.
Pre-Compaction Memory Flush
This is the feature I wish I'd had from day one.
OpenClaw compacts context when the conversation gets long, which is necessary, but anything the agent "knows" from the conversation that it hasn't written to disk yet gets lost in the process. You can be mid-discussion about a complex architecture decision, compaction fires, and suddenly the agent has no idea what you were talking about.
The fix is a memory flush: a silent agent turn that fires before compaction, giving the agent a chance to write unsaved context to disk.
{
"compaction": {
"memoryFlush": {
"enabled": true,
"softThresholdTokens": 10000
}
}
}When the context hits 10,000 tokens remaining before the compaction threshold, the agent gets a silent turn: "Write anything important to memory files now." It flushes decisions, facts, corrections, whatever it's been holding in context, and then compaction happens safely.
10K tokens is conservative, and you could set it higher for models with larger output windows. I'd rather flush too early than lose context. The net effect is that compaction stops being a data loss event and becomes a checkpoint instead.
The 5-Layer Persistence Stack
After rebuilding, I realized that memory persistence isn't one thing but a stack, where each layer catches what the layer above misses.
Layer 1: Immediate Writes (behavioral)
└── Agent writes to files mid-conversation as things happen
Layer 2: Memory Checkpoint (every 15 min)
└── System event in main session: "Write anything unsaved NOW"
Layer 3: Heartbeat (every 30 min)
└── Isolated session: email, calendar, health checks
Layer 4: QMD Re-index (every hour)
└── Isolated session: update search index across all collections
Layer 5: Daily Maintenance (2-3 AM)
└── Isolated sessions: distill daily logs, self-review, archive old filesLayer 1 is behavioral. The agent's instructions tell it to write to files immediately when something meaningful happens, rather than waiting or keeping a "mental note." This is the first line of defense, and it catches 80% of what matters.
Layer 2 is the safety net. Every 15 minutes, a system event fires in the main session: "If you've had meaningful exchanges since last write, capture them NOW." This catches the stuff the agent forgot to write immediately. It runs in the main session because it needs to see the conversation context.
Layer 3 is background work. Every 30 minutes, an isolated session checks email, calendar, and health data, and writes its findings to memory files. Isolated here means it doesn't block the main conversation, though it does announce anything urgent like an important email or an upcoming meeting.
Layer 4 keeps search fresh. Every hour, QMD re-indexes all collections, so new daily log entries, updated project docs, and fresh memory files all become searchable within the hour.
Layer 5 is garbage collection. At 2 AM, a cron distills the last 3 days of daily logs into thematic memory files, deduplicates entries, archives old content, and enforces size limits. At 3 AM, a self-review checks for stale tasks, stuck subagents, and growing error logs.
The key design constraint is that only Layer 2 touches the main session. Everything else runs in isolated sessions, so the agent stays responsive to you while maintenance happens in the background.
Workspace Governance
This is the part most memory guides skip entirely. They tell you to write everything down but never address where the files actually go.
After 18 days, my workspace looked like this:
memory/
├── 2026-01-15.md through 2026-02-04.md
├── references/ ← 20+ saved articles
├── recipes/ ← cooking experiments
├── active-tasks.md
├── error-log.md
├── projects.md
├── ... 10 more thematic filesSaved articles and cooking experiments aren't memory, but they were sitting in memory/ because that's where the agent defaulted to putting things. Without explicit rules, everything ends up in one directory.
So I wrote WORKSPACE.md, a single file that every agent, cron, and subagent reads before writing anything.
workspace/
├── memory/ → operational state (daily logs, thematic files)
├── research/ → all research outputs
│ ├── topics/
│ ├── comparisons/
│ └── deep-dives/
├── drafts/ → work-in-progress writing
│ ├── work/
│ ├── blog/
│ └── personal/
├── references/ → saved knowledge (articles, patterns)
├── archive/ → cold storage (old logs, completed research)
└── scripts/ → utility scriptsThe rules are simple:
- Naming: lowercase-hyphen. Date prefix for temporal files. No spaces, no underscores.
- Archival: Daily logs older than 90 days move to
archive/memory/. Research idle for 30+ days moves toarchive/research/. Archive, never delete. - Size limits: Daily logs max 500 lines. Research max 1000 lines. Memory thematic files max 200 lines.
- One concern per file. Don't dump unrelated content in one file.
- QMD collections align with directories. Every top-level directory is a searchable collection.
The daily maintenance cron at 2 AM enforces all of this automatically. It deduplicates daily logs, archives old files, regenerates indexes, and flags anything over the size limits.
None of this is glamorous, but it's the difference between a system that works for 2 weeks and one that's still usable after 2 years.
What I Learned
1. Context is not memory. A 1M token context window feels infinite, but it isn't, and it vanishes on crashes, restarts, and compaction. The files are what persist, so the habit that matters is writing things down before you reason about them.
2. Retrieval quality matters more than retrieval quantity. Saving everything is easy; finding the right thing later is the hard part. Hybrid search, MMR, and temporal decay are what turned noisy results into useful ones.
3. Memory systems need maintenance schedules. Databases need vacuum and compaction, and agent memory is no different — it needs deduplication, archival, and distillation. If you don't automate that, it won't happen.
4. Workspace structure is a feature. Without explicit governance, agents default to dumping everything in one place. Writing the rules down and enforcing them with crons is what keeps the workspace from drifting into entropy over months of use.
5. Compaction is the enemy — or more precisely, unprotected compaction is. A memory flush before compaction turns what would be a data loss event into a checkpoint.
6. Layer your defenses. No single mechanism catches everything. Immediate writes catch roughly 80%, checkpoints catch another 15%, and daily maintenance picks up the rest. The point is to stack them so nothing falls through.
The Numbers
After 19 days of running this system:
| Metric | Value |
|---|---|
| Memory files | 30+ |
| Reference docs | 20+ |
| QMD vectors | ~2,000 |
| Search collections | 6 |
| Daily log entries | ~150/day (before dedup) |
| Cron jobs | 6 (5 maintenance + 1 data sync) |
| Compaction data loss events | 0 (flush catches everything) |
| API cost for search | $0 (all local, Metal GPU) |
Getting Started
If you already did Part 1 (error log + QMD + heartbeat), here's what to add:
1. Enable hybrid search + MMR + temporal decay (2 minutes)
Patch your OpenClaw config:
{
"memory": {
"search": {
"hybrid": { "enabled": true, "vectorWeight": 0.7, "textWeight": 0.3 },
"mmr": { "enabled": true, "lambda": 0.7, "candidateMultiplier": 4 },
"temporalDecay": { "enabled": true, "halfLifeDays": 30 }
}
}
}2. Enable memory flush (1 minute)
{
"compaction": {
"memoryFlush": { "enabled": true, "softThresholdTokens": 10000 }
}
}3. Create WORKSPACE.md (10 minutes)
Define your directory structure, naming conventions, archival rules, and size limits, and make every agent and cron read it before writing files. The exact structure depends on your use case, but the principle holds either way: without explicit rules, the workspace drifts into entropy.
4. Set up the daily maintenance cron (5 minutes)
An isolated cron at 2 AM that deduplicates daily logs, archives old files, distills learnings, and enforces workspace rules. This is your garbage collector.
What's Next
The system is stable now, but there are a few things I still want to explore:
- Cross-session memory sharing. Right now, isolated cron sessions can't see the main session's context. Memory flush helps, but there's still a gap.
- Memory scoring. Not all memories are equal. A user correction should rank higher than a routine observation. Weighted memory types could improve retrieval further.
- Automatic memory graph. Connections between memories (this error led to this lesson, which changed this project decision) would make the agent's reasoning more transparent.
For now, the stack handles everything I throw at it. 19 days in, the agent knows my projects, my preferences, my health data, my team, and my coding style better than any tool I've used — not because it's any smarter than the rest, but because it actually holds on to what it learns. That, more than anything else, is what makes it useful.
Built on OpenClaw. Search powered by QMD and OpenClaw's native hybrid retrieval. Everything runs locally on Apple Silicon.
