I run six or seven projects at any given time. Each one has at least one AI agent running in a terminal: Claude Code, sometimes Aider, sometimes something else. They read files, write code, run tests, ask me questions. All at once.
And I have no idea what any of them are doing.
Not because they're secretive. Because the tools I use, great tools, mind you, were built for a world where one person uses one terminal for one thing. That world ended about a year ago.
The Actual Problem
Here's what a typical afternoon looks like. Claude Code is refactoring authentication in project A. Another instance is writing tests in project B. A third one hit a compilation error in project C ten minutes ago and is stuck in a retry loop. I don't know that, because it's on a different desktop and I haven't checked.
Meanwhile, project D's agent finished its task twelve minutes ago and is waiting for my approval. I'll find out when I eventually Cmd+Tab my way over there and see the prompt blinking at me.
This is the org chart problem from my last piece, made visceral. I have employees. They're working. I have no dashboard. No standup. No Slack channel. Just a grid of desktops I cycle through like a security guard checking monitors.
Why Terminals Don't Know
Ghostty is excellent. Fast, GPU-accelerated, beautiful. But it renders characters on a screen. That's what a terminal does. It doesn't know, and can't know, that the characters scrolling past are an AI agent editing your authentication middleware.
tmux gives you panes. Split, resize, scroll. It's a spatial organizer for text streams. It doesn't know that one pane is stuck and another is waiting for input. It shows you the same green bar regardless.
The existing tools that try to fix this (Claude Squad, Agent Deck) are tmux wrappers. They manage processes. But they still treat agents as opaque text streams. And they look like 1997.
The gap isn't features. It's perspective. These tools see terminals. They don't see agents.
Observe, Never Orchestrate
So the missing piece has a shape. Call it a cockpit: a mission control for the fleet of agents that now live on your machine.
The core idea is one sentence: show me what all my agents are doing without touching any of them.
That constraint matters more than any feature. I don't want a layer that sends commands to my agents, auto-approves their requests, or decides what to do on my behalf. I've thought too much about delegation gone wrong to want that. The agent works. I supervise. The cockpit is the glass between me and the factory floor, not a hand reaching onto it.
Every time I'm tempted by a "smart" feature (auto-approve safe operations, chain agents together, pre-fill responses) I ask one question: would a good mission control operator want the console to start pressing buttons on its own? The answer is always no.
Reading the Room
For a cockpit to be useful it has to understand what it's looking at. And here's the surprising part: it doesn't need much.
Four states are enough. Inactive, working, needs input, error. That's the whole vocabulary. Which agent is thinking, which is waiting for you, which is stuck. Those four states, at a glance, across every project, change the entire experience.
You can get there without an LLM. Pattern-matching the terminal output is enough, and it's instant. The agent is "working" when output streams. It "needs input" when it asks a question and goes quiet. It's in "error" when the same failure repeats. A loop is just the same error three times in a row.
And once you have states, you want a narrative. Not "working" but "Editing auth module, 8 files, 2m." Not "error" but "Stuck: compile error (5x, 3m)." The raw state tells you the color; the narrative tells you whether to walk over.
The Activity Stream
The feature I want most is the one I never find in these tools: a structured timeline of everything every agent did. Files created, commands run, errors hit, tasks completed, sub-agents spawned. Across all projects.
Because the real question, the one I ask twenty times a day, is: "Wait, what did the agent in project C do while I was focused on project A?" Today the answer is always the same: scroll up, read a wall of text, reconstruct it yourself.
A cockpit should do that reconstruction for you. Passively. No cooperation from the agent required. Just watch the output, extract the structure, keep the log.
The Brain and the Window
I built a version of this for myself. A real one: a native app, a window with every project in a grid. It worked. And using it taught me the thing I didn't expect.
The valuable part wasn't the window. It was what ran behind the window: the pattern matching, the state detection, the event extraction. The window was just one way to look at what that layer already knew.
So the cockpit is really two things pretending to be one: an intelligence layer that watches and understands, and a rendering layer that shows it to you. And the moment you separate them, something clicks.
When observation lives inside a GUI, observation only happens while the GUI is open. Pull it out into something always-on, a daemon, infrastructure, like a database or a web server, and observation becomes continuous. The screen becomes just one client. A dashboard, a script, a notification, even another agent: anything can read the same stream.
A good mission control room has instruments that record whether or not anyone is watching the screens. The instruments are the real system. The screens are a convenience.
So What?
I wrote a few weeks ago that running multiple agents makes you your own middle manager. The cockpit is the management layer that job needs, and right now it mostly doesn't exist.
But sitting with the problem taught me something. It isn't that agents are hard to manage. It's that we don't even have the vocabulary for what managing them means. We borrow from DevOps (monitoring, orchestration), from management theory (delegation, oversight), from UX (dashboards, notifications), and none of it quite fits.
This is a new kind of work. Not coding. Not managing. Something in between that doesn't have a name yet. You're reading output, judging quality, deciding priorities, context-switching across domains, all while the agents do the actual typing.
The cockpit is still missing. People are building it now, myself included, in the open, with others. I don't think it ends as an app. It ends as a layer: always-on, observing, indifferent to how you choose to look at it. The window is the easy part. The instruments are the work.
*Still running agents in parallel every day. Still cycling desktops more than I'd like. Still figuring out what the cockpit actually is.*