The Librarian, Not the Knowledge Graph

When the blood test came back with one value slightly above the threshold, my AI agent didn't open a chart, didn't send me to an article, didn't run statistical inference. It connected that value to the note I had written the night before: "didn't stick to the nutrition plan, heavy dinner at the restaurant." And it suggested I shouldn't worry before a second check.

That connection (clinical result ↔ dietary deviation) wasn't in any knowledge graph. It was in a corpus: the medical report on one side, the note on the other. The agent found them at the right moment because it had access to both stores and ran retrieval-plus-synthesis when it mattered.

For months I've been trying to build a personal memory layer that works across things: work, family, health, projects, people. Not vertically, the way Karpathy's pattern works when applied to a single domain. The question driving me is simple: which architecture holds up at scale without producing confusion, false positives, or inappropriate revelations?

I don't have an answer yet. But I do have a point of view that's settling into place, and I want to put on paper where I am right now.

A Catalog, Not a Graph

The first mistake I made was thinking I needed a knowledge graph. Typed edges between concepts, ontologies, pre-built semantic relationships. It doesn't scale. Every new node requires explicit decisions, every new source forces an explosion of probabilistic links, and at some point the noise buries the signal.

What's convincing me is something humbler, and it's what my personal agent (I use Hermes from Nous Labs) is already doing in my daily life. When I upload a medical report, it classifies it and files it into a precise structure: Family / Health / <person> / <specialist>. Next time, it already knows where it goes. More importantly: it knows it exists. The structure isn't a graph of concepts, it's a map of where things are. An index, not a graph.

The personal memory layer isn't a knowledge graph. It's a library catalog: a map of where things are, not of what is connected to what. The librarian (the agent) knows where to look and goes to read at the moment. It makes the semantic connections when they're needed, not before.

Two panels: a knowledge graph as a dense tangle of nodes linked to everything, versus a library catalog, a cabinet of drawers and a calm librarian who knows where to look

The System That Evolves On Its Own

There's another episode that convinces me even more than the single blood-test connection. When I started uploading my family's medical reports, I only asked the agent to "organize them by person." I didn't write custom skills, I gave it no schemas. What it did was write itself a skill for handling this kind of data. Today it consistently maintains the structure by person, specialist, and date. Without me telling it how.

What convinces me isn't that the agent finds things. It's that it learns to find them better over time, without me telling it how.

The Same Pattern, Enterprise Edition

The same model is emerging in the enterprise projects I'm discussing. An Italian asset management firm I've been in conversation with these past weeks is articulating an "industrial" version of the same thesis: voice-capture from advisors as the primary source, automatic enrichment via cron-jobs from public sources, a knowledge base of capabilities, a client-facing AI assistant. Same pattern as me logging dietary deviations, except instead of writing, they speak. Voice is the first natural input for people who don't write.

The Blind Spot They All Share

And the same blind spot. When the corpus grows large enough, all these systems do the same thing: they write down, into files of varying interest, whatever crosses their mind in the moment. They're verbose. They memorize too much. And then they forget the obvious.

Today's personal memory systems are all hippocampi without a hippocampus: they write a lot and forget little, while the human brain does exactly the opposite.

What's missing from every current approach (including Karpathy's, including mine) is a mechanism for reinforcement and forgetting of connections. Not of information: of connections. A connection X ↔ Y that gets used and proves useful should strengthen. One that never gets used, or proves wrong, should fade. Today no system does this: everything that was connected once stays connected forever.

connection-decay.sim ready

both notes stay on the shelf. only the link between them fades.

turns since this link was last reactivated

link strength 100%. fresh link, fully retrievable.

try it Drag time forward and a link that is never reactivated decays, from retrievable through weakening to effectively gone. The two notes never leave the shelf, only the connection between them lapses. Reuse it (it proved useful) and the link holds; flag it wrong and it drops hard. This dramatizes one half of the bet in the prose above, time-decay; the other half, success and failure validation, is the two buttons. This is the proposed mechanism, not how systems work today. the bet: a link that earns its keep stays. today no system does this.

If I had to bet on how this gets built, I'd lean toward two families of mechanisms. Success-based: the connection that led to a useful or user-validated output gets stronger, the one that produced noise or a correction gets weaker. Time-decay, Ebbinghaus-style: connections that aren't reactivated decay naturally, unless frequency of use brings them back up. Probably an explicit combination of both. I don't know yet. But I suspect anyone who wants to build personal memory that holds up over time will have to come back to these two ingredients.

On the Failures of the Past

Even the "easy" case (a vertical, single-purpose, customer-care agent) isn't trivial. In June 2024 McDonald's shut down the AI drive-thru it had developed with IBM after months of viral order mistakes. In February 2024 a Canadian tribunal held Air Canada liable for promises its chatbot made to a passenger. These are the pattern, not the exception.

When an agent speaks on behalf of a brand, every answer is a contractual promise. Current systems don't know that, and it shows.

Personal memory is, in some ways, simpler: one user, one domain, one confidentiality constraint. But even there I'm not optimistic it "stands up" without incidents. It's a system you work with, not a system you launch.

Where I'm Looking

The five hypotheses I have under evaluation: hierarchical indexes (I'm already experimenting with a map.md file in my personal vault); distillation files that condense discussions into key points; vertical branches on tags that narrow the scope at query time; embeddings as a second layer of semantic retrieval; periodic compaction that rewrites accumulated summaries. None of the five is the solution. Probably the answer is a combination, and probably it's specific to the unit of memory. Which is another open problem: not one memory shared across all agents, sure; not one per single agent; maybe something around the person, but not flat on the person either.

Then there's the federation problem: how do you pass relevant information from one memory to another, from one agent to another, when the future is agent-to-agent. All work still to be done.

I'm not publishing a solution. I'm publishing an exploration. The difference, for whoever reads this, matters: what you find here may change as the problem becomes clearer.

If you have insights, corrections, or failures you think are worth sharing, write to me.

Reinforcement, forgetting, "an answer is a contractual promise": these are all things you can measure, and almost nobody does. That's the work I'm turning into Tessera: an open, MCP-native benchmark for accuracy, provenance, and correct refusal over fragmented enterprise knowledge.

*Still running the vault. Still verbose, still a little amnesic. This post is a photograph of where I am today, not a declaration of a solution.*