When the blood test came back with one value slightly above the threshold, my AI agent didn't open a chart, didn't send me to an article, didn't run statistical inference. It connected that value to the note I had written the night before: "didn't stick to the nutrition plan, heavy dinner at the restaurant." And it suggested I shouldn't worry before a second check.
That connection (clinical result ↔ dietary deviation) wasn't in any knowledge graph. It was in a corpus: the medical report on one side, the note on the other. The agent found them at the right moment because it had access to both stores and ran retrieval-plus-synthesis when it mattered.
For months I've been trying to build a personal memory layer that works across things: work, family, health, projects, people. Not vertically, the way Karpathy's pattern works when applied to a single domain. The question driving me is simple: which architecture holds up at scale without producing confusion, false positives, or inappropriate revelations?
I don't have an answer yet. But I do have a point of view that's settling into place, and I want to put on paper where I am right now.
A Catalog, Not a Graph
The first mistake I made was thinking I needed a knowledge graph. Typed edges between concepts, ontologies, pre-built semantic relationships. It doesn't scale. Every new node requires explicit decisions, every new source forces an explosion of probabilistic links, and at some point the noise buries the signal.
What's convincing me is something humbler, and it's what my
personal agent (I use Hermes from Nous Labs) is already doing in my
daily life. When I upload a medical report, it classifies it and
files it into a precise structure:
Family / Health / <person> / <specialist>.
Next time, it already knows where it goes. More importantly: it
knows it exists. The structure isn't a graph of concepts, it's a
map of where things are. An index, not a graph.
The personal memory layer isn't a knowledge graph. It's a library catalog: a map of where things are, not of what is connected to what. The librarian (the agent) knows where to look and goes to read at the moment. It makes the semantic connections when they're needed, not before.
The System That Evolves On Its Own
There's another episode that convinces me even more than the single blood-test connection. When I started uploading my family's medical reports, I only asked the agent to "organize them by person." I didn't write custom skills, I gave it no schemas. What it did was write itself a skill for handling this kind of data. Today it consistently maintains the structure by person, specialist, and date. Without me telling it how.
What convinces me isn't that the agent finds things. It's that it learns to find them better over time, without me telling it how.
The Same Pattern, Enterprise Edition
The same model is emerging in the enterprise projects I'm discussing. An Italian asset management firm I've been in conversation with these past weeks is articulating an "industrial" version of the same thesis: voice-capture from advisors as the primary source, automatic enrichment via cron-jobs from public sources, a knowledge base of capabilities, a client-facing AI assistant. Same pattern as me logging dietary deviations, except instead of writing, they speak. Voice is the first natural input for people who don't write.
The Blind Spot They All Share
And the same blind spot. When the corpus grows large enough, all these systems do the same thing: they write down, into files of varying interest, whatever crosses their mind in the moment. They're verbose. They memorize too much. And then they forget the obvious.
Today's personal memory systems are all hippocampi without a hippocampus: they write a lot and forget little, while the human brain does exactly the opposite.
What's missing from every current approach (including
Karpathy's, including mine) is a mechanism for
reinforcement and forgetting of
connections. Not of information: of connections. A
connection X ↔ Y that gets used and proves
useful should strengthen. One that never gets used, or proves
wrong, should fade. Today no system does this: everything that
was connected once stays connected forever.
both notes stay on the shelf. only the link between them fades.
If I had to bet on how this gets built, I'd lean toward two families of mechanisms. Success-based: the connection that led to a useful or user-validated output gets stronger, the one that produced noise or a correction gets weaker. Time-decay, Ebbinghaus-style: connections that aren't reactivated decay naturally, unless frequency of use brings them back up. Probably an explicit combination of both. I don't know yet. But I suspect anyone who wants to build personal memory that holds up over time will have to come back to these two ingredients.
On the Failures of the Past
Even the "easy" case (a vertical, single-purpose, customer-care agent) isn't trivial. In June 2024 McDonald's shut down the AI drive-thru it had developed with IBM after months of viral order mistakes. In February 2024 a Canadian tribunal held Air Canada liable for promises its chatbot made to a passenger. These are the pattern, not the exception.
When an agent speaks on behalf of a brand, every answer is a contractual promise. Current systems don't know that, and it shows.
Personal memory is, in some ways, simpler: one user, one domain, one confidentiality constraint. But even there I'm not optimistic it "stands up" without incidents. It's a system you work with, not a system you launch.
Where I'm Looking
The five hypotheses I have under evaluation: hierarchical
indexes (I'm already experimenting with a map.md
file in my personal vault); distillation files that condense
discussions into key points; vertical branches on tags
that narrow the scope at query time; embeddings as a second
layer of semantic retrieval; periodic compaction that rewrites
accumulated summaries. None of the five is the solution.
Probably the answer is a combination, and probably it's specific
to the unit of memory. Which is another open problem: not one
memory shared across all agents, sure; not one per single agent;
maybe something around the person, but not flat on the person
either.
Then there's the federation problem: how do you pass relevant information from one memory to another, from one agent to another, when the future is agent-to-agent. All work still to be done.
I'm not publishing a solution. I'm publishing an exploration. The difference, for whoever reads this, matters: what you find here may change as the problem becomes clearer.
If you have insights, corrections, or failures you think are worth sharing, write to me.
Reinforcement, forgetting, "an answer is a contractual promise": these are all things you can measure, and almost nobody does. That's the work I'm turning into Tessera: an open, MCP-native benchmark for accuracy, provenance, and correct refusal over fragmented enterprise knowledge.
*Still running the vault. Still verbose, still a little amnesic. This post is a photograph of where I am today, not a declaration of a solution.*