claude-mem, mem0, Letta, Zep, Munin: which agent memory tool fits what
Most of these tools solve a different problem. Only two compete head-on, and picking the wrong one wastes a weekend.
"Agent memory" got flattened into one phrase, but the five tools below sit at different layers of the stack. One is a Claude Code plugin. One is a library you call from app code. One is the agent runtime itself. One is an enterprise context platform. One is a local correction layer. Mapping each to its real job makes the choice obvious.
What claude-mem actually is
A persistent memory plugin for Claude Code (and Gemini CLI). It compresses session transcripts and re-injects them when you start a new session, so the model picks up where you left off. Storage is SQLite plus a Chroma vector DB plus FTS5 for keyword search. A local Bun process exposes an HTTP API on port 37777.
Source is on github.com/thedotmack/claude-mem. AGPL-3.0, local-only, no SaaS. The target user is someone who lives in Claude Code and is tired of re-explaining context every morning.
Honest weakness
It's recall-first. It surfaces old turns; it doesn't check whether those turns were correct, contradicted later, or already promoted to a rule. If your agent keeps making the same mistake, claude-mem will faithfully recall the mistake too.
What mem0 actually is
A memory layer you import as a library. You call add() on conversations, and an LLM (OpenAI by default) extracts facts and stores them in a vector store of your choice. You call search() later to pull relevant memories into a prompt. Apache 2.0.
It ships two ways: self-hosted OSS via pip or npm from github.com/mem0ai/mem0, or the hosted SaaS at app.mem0.ai. The audience is developers building chat products, support agents, or personal assistants where the app code owns the loop.
Honest weakness
The fact-extraction step runs an LLM call per write, which adds latency and cost at scale. And because extraction quality depends on the prompt, you can end up with memories that are technically true but useless ("user mentioned Tuesday").
What Letta is (and why it's not really comparable)
Letta, formerly MemGPT, is the agent framework, not a memory add-on. It runs the agent. Memory is a first-class part of the runtime: editable memory_blocks kept in context, archival memory for long-term storage, and recall memory for past conversations. Postgres-backed. Apache 2.0.
You can run it locally with the Letta Code CLI or hit the hosted API. Repo: github.com/letta-ai/letta. The target is developers building stateful agents from scratch, where you want the agent to manage its own memory blocks via tool calls.
Honest weakness
If you already have an agent loop you like, Letta is overkill. You'd be replacing your runtime to get the memory model. It fits when you're starting fresh and want tiered memory baked in. It doesn't fit when you want to bolt memory onto an existing app.
What Zep actually is
A context-engineering platform built on Graphiti, a temporal knowledge graph. Every fact carries valid_at and invalid_at timestamps, so the graph knows when something stopped being true. Retrieval targets sub-200ms. Apache 2.0 on the open-source pieces, with a cloud product that holds SOC2 and HIPAA.
Repo: github.com/getzep/zep. The buyer is enterprise: teams shipping agents into regulated workflows where "the user's address changed in March" matters and audit trails are required.
Honest weakness
Cloud-first. Self-hosting examples exist but the smooth path is the SaaS, with accounts, billing, and data leaving your machine. If you're a solo dev or you can't send conversation data to a third party, the friction adds up fast.
What Munin actually is
A local Rust binary that reads your existing agent transcripts (~/.claude/, ~/.codex/, others) and turns them into a continuity-and-correction layer. No SaaS. No account. No telemetry. Four commands do the work:
resume: a startup brief so a fresh session knows what you were doing.nudge: the single ranked next action.promote: turn a recurring observation into a rule the agent will follow.friction: a per-agent report of corrections, what the model keeps getting wrong.
The internal model is a pipeline: observation, claim, rule. Each item carries freshness, stability, and confidence scores. That's the part nothing else on this list does.
Honest weakness
Munin is not a retrieval layer. It does not pull old messages back into context the way claude-mem or mem0 do. If your problem is "I forgot what we decided last Tuesday", Munin is the wrong tool. It's also pre-customer, so the surface area is small on purpose.
How to pick
The five tools split cleanly by what you're trying to fix:
- I want my Claude Code sessions to remember context across restarts. claude-mem.
- I'm building a chat app and want a drop-in memory API. mem0.
- I'm building an agent from scratch and want tiered memory in the runtime. Letta.
- I'm shipping agents into an enterprise and need temporal facts plus SOC2. Zep.
- My agent keeps making the same mistakes and I want a correction loop. Munin.
The two that overlap are claude-mem and mem0, and only at the edges. claude-mem owns the Claude Code surface; mem0 owns the application-code surface. If you're picking between them, it's a question of where your loop lives.
Cost shape matters too
claude-mem and Munin are local, no per-call cost. mem0 OSS is local but each write triggers an LLM extraction call. Letta self-hosted needs Postgres. Zep Cloud is metered. If you're a solo dev experimenting, the local tools are the cheapest to find out you don't like them.
What to actually do today
If you're in Claude Code every day and on Mac or Linux, install claude-mem first. It's the cheapest adoption: a plugin, a local DB, no signup. You'll know within a week whether session continuity is the thing you were missing.
If you've already got that and you're hitting correction fatigue, the same tone-policing or path-formatting mistake across Claude, Codex, Cursor, that's when Munin's observation-to-rule model pays off. Run munin friction against your transcript directory and look at what your agents repeat. If the report surprises you, the loop was worth closing.
If neither describes you, you're probably building something that takes mem0 or Letta. Start with mem0 if you have an app and want to bolt memory on. Start with Letta if you're writing the agent itself and want memory in the runtime, not next to it.
Zep is the right answer when a procurement team is in the room. If that's not your situation yet, you'll know when it is.