Memory in AI Agents

Every conversation with an AI is a first date. The model doesn't remember that yesterday you spent two hours debugging together. Doesn't remember your dog's name. Doesn't remember you hate semicolons.

Each time, you start from scratch. Fresh slate. Tabula rasa. And this bothers me more than it probably should.

The Goldfish Problem

LLMs are stateless. They learn during training, store everything in their weights, and then... that's it. What you tell them now doesn't become part of who they are. It's like talking to someone with permanent anterograde amnesia.

So developers started building memory systems. Short-term memory. Long-term memory. Episodic, semantic, procedural. Terminology borrowed from cognitive psychology.

But I wonder: are we actually giving AI memory, or are we just building increasingly elaborate note-taking systems that an LLM can query?

Two Ways to Remember

From what I understand, there are essentially two places where an agent can "remember" things:

Context window - everything the model can see right now. The current conversation. Instructions. Recent messages. It's like RAM in your computer. Fast but volatile.

context-window.sim ready

the window holds the most recent turns. as the conversation gets longer it pushes the oldest turns off the top, fact and all.

0 turns of new conversation. the fact hates-semicolons is still in the window, the agent can use it.
try it This is what "volatile" means for the context window. Drag time forward and the oldest turns scroll out of the window one by one, greyed and tagged "(out of window)". The fact in turn 1, that you hate semicolons, is the first to go, and the moment it leaves the window the agent will write semicolons again. It was never wrong, it just cannot see it anymore. "Start a fresh session" is the goldfish point next to it: a brand new conversation begins with none of this in the window, the fact was never even said here. This is an illustrative window, not a measured context length: in window or out of window, no token counts, no model named. the context window is volatile: a fact you stated can scroll past the edge of what the model can see, conversation and all.

External storage - databases, vector stores, graph databases. The hard drive. Persistent but requires retrieval.

The real challenge is moving information between the two. When do you save something? When do you retrieve it? How do you know what's relevant?

What Actually Gets Remembered

There's this taxonomy that keeps showing up everywhere. Working memory, semantic memory, episodic memory, procedural memory. Sounds very human.

But Sarah Wooders from Letta makes an interesting point: an LLM is a tokens-in-tokens-out function, not a brain. The anthropomorphized analogies might be misleading us.

I don't know which framing is more useful. Maybe both. Maybe neither. The field is moving so fast that what I write today might be obsolete tomorrow.

The Forgetting Problem

Here's what keeps me up at night: teaching AI to forget.

Humans forget naturally. Our brains prune irrelevant information constantly. We don't remember what we had for breakfast three Tuesdays ago. And that's good - it keeps us sane.

But for AI agents, forgetting has to be engineered. Explicitly programmed. Someone has to write logic that says "this information is now obsolete, delete it."

How do you automate judgment about what to forget? Update your address and forget the old one - sure, that's easy. But what about nuanced information? Preferences that shift? Context that becomes irrelevant?

Memory bloat degrades quality. Too much information and the signal gets lost in noise. But delete the wrong thing and you've created a different kind of problem.

Hot Path vs Background

There's an interesting design question: when should the agent decide to remember something?

Hot path - the agent autonomously decides "this is important" and stores it in real-time. Like how humans consciously note important information.

Background - a separate system processes conversations after they happen, extracting what seems relevant. Like your brain consolidating memories while you sleep.

The hot path is elegant but error-prone. How does an agent know what's important? The background approach is safer but adds latency and complexity.

Neither solution feels quite right yet.

So What?

I've been using AI assistants daily for over a year now. The lack of persistent memory is probably the biggest friction point.

Every time I start a new session, I spend the first few minutes re-establishing context. Explaining my preferences. Reminding it about the project we're working on. It's like having a brilliant colleague with amnesia.

The frameworks emerging now - mem0, Letta, Cognee, zep - they're all trying to solve this. LangChain, LlamaIndex, and others have their own memory implementations.

But we're still in the "figure out the right abstractions" phase. No one has cracked it yet.

Current State

Memory in AI agents is one of those problems that sounds simple ("just save the important stuff") until you actually try to implement it ("what's important? for how long? how do you retrieve it efficiently?").

I think the teams that get memory right will have a significant advantage. An agent that actually knows you, that remembers your context, that learns from interactions - that's a different product entirely.

But we're not there yet. The goldfish still rules.

The question I keep circling, whether a memory system remembers and forgets the right things, is really a measurement question. It's part of what I'm building Tessera to test: an open benchmark for agent reliability over enterprise knowledge, memory included.


*Following Leonie Monigatti's excellent study notes on this topic. Still processing. Still learning.*

← Back