Anatomy of a Cognitive Daemon: How Cashmere Thinks While You Sleep
Most AI is a function call. A real personal agent is a process — long-running, self-pacing, and reasoning while you're not looking. Here's how the Cashmere daemon is wired.
Most "AI" is a function call. You hand it a prompt, it returns a string, the conversation ends. The thing that's been quietly missing for the last three years is the process — a long-running program that thinks on its own clock.
A daemon, in operating-systems terms, is a process that keeps running with no terminal attached. It wakes up, does work, goes back to sleep. Cashmere is a daemon that thinks.
Why a process, not a function?
The cloud-AI mental model is "request in, response out." That model is fine for one-shot tasks, but it's architecturally incapable of doing anything between requests. The moment you want an agent that notices things, prepares things, follows up, watches a feed, or writes you a briefing in the morning, the function-call model breaks. There's nobody home when you're not looking.
The daemon model gets you four properties at once:
- Persistence. The agent has a memory of its own activity, not just yours.
- Initiative. Work can happen without a triggering prompt.
- Pacing. Cheap tasks run often, expensive ones run rarely, and the agent decides.
- Recovery. When something fails, it retries. When the machine reboots, it picks up.
These are basic OS-process properties, but no chatbot has them. Cashmere does because it is one.
The tick
At the heart of cashmere/daemon/loop.py is a small infinite loop. Every tick — roughly once
a minute, configurable — it asks one question: what's due?
Some things are due on wall-clock cadences: the daily briefing fires once a day, the cross-thread synthesis fires every few hours, watchlist checks fire on per-watch intervals. Some things are due because they were enqueued by an earlier turn — a chat just happened, the system wants to extract memories from it. Some things are due because a watcher tripped: a Chrome tab triggered a research follow-up, an inbound Telegram message triggered an interactive response.
The scheduler doesn't care. It walks the table of pending tasks, picks the ones whose preconditions are met, and enqueues them. The workers do the rest.
Workers, the durable kind
Each worker is a typed handler — a class subclassing BaseWorker with a task_type
string and an execute method. There's one for daily briefings, one for cross-thread synthesis,
one for watchlist checks, one for memory consolidation, one for deep research, and so on. New cognitive
behaviors are added by writing one of these and registering it. That's the entire extension surface.
Tasks are SQLite rows, not in-memory queues. That choice does a few things:
- If the daemon crashes mid-task, the row is still there when it comes back.
- If you reboot the laptop overnight, the queue picks up exactly where it left off.
- The HTTP API can see what's pending and surface it in the dashboard.
- You can debug the system by reading the database with any SQLite tool.
"Durable by default" is unusual in agent frameworks. Most assume the agent is a single process holding everything in RAM. Real agents — agents that run for years — can't make that assumption.
Pacing without a human
A daemon left to run flat-out will turn your machine into a small space heater. The hard problem of autonomous systems isn't getting them to work — it's getting them to work at the right rate.
Cashmere paces itself with a self-observing health pulse. Every loop iteration, it scores its recent activity along a few axes: how engaged is the user (briefings opened, chats sent), how reliable is the extraction pipeline, how stale is the prod binary, what's the noise rate on proactive briefings. If things are healthy and quiet, the loop slows down. If something's drifting — say, the briefings are accumulating without being read — it backs off the briefing cadence rather than doubling down.
This is the part that turns a script into something that feels like a colleague. A script doesn't know whether it's being useful. The daemon does, and adjusts.
What "thinking while you sleep" looks like
A normal day for the daemon: you go to bed at midnight. Around 1am, the consolidation worker wakes, looks at the day's chats and extracted entities, promotes a few things to semantic memory, prunes noise. At 4am, the deep-research worker picks the top-priority watchlist item and runs a multi-step research loop against its query, writes a briefing if anything changed. At 6am, the daily briefing worker pulls the morning summary together — what happened in the watch alerts, what action items are pending, what cross-thread connections emerged. At 7am, when you pick up your phone, there's a Telegram message waiting.
The chatbot waits for you to say something. The daemon waits for there to be something worth saying.
The daemon model is the agent model
"Agent" has been a marketing word for two years. Underneath the marketing is a real architectural distinction: an agent is a long-running process with memory, goals, and the ability to act between your prompts. That requires a daemon. There is no shortcut — a chatbot wrapped in a cron job is not an agent, it's a chatbot wrapped in a cron job.
Build the process first. Everything else — memory, tools, skills, interfaces — composes onto it.
Inside Cashmere: the loop lives in cashmere/daemon/loop.py. Workers are
in cashmere/daemon/workers.py. Scheduling logic is in cashmere/daemon/scheduler.py.
Everything is observable through the dashboard at localhost:8420.