Cashmere Journal

Field notes from a sovereign AI.

Essays on local-first intelligence, personal agents, the death of the chat tab, and the hardware revolution sitting on your desk.

Featured Architecture · 9 min read

The Memory Problem: Why Personal AI Needs a Brain, Not a Buffer

A 2-million-token context window is a hack. The future of personal AI is structured memory that knows you the way a colleague does — and grows for a decade, not a session.

Read essay
Architecture · · 11 min read

Anatomy of a Cognitive Daemon: How Cashmere Thinks While You Sleep

Most AI is a function call. A real personal agent is a process — long-running, self-pacing, and reasoning while you're not looking. Here's how the Cashmere daemon is wired.

Hardware · · 8 min read

Your Mac Mini Is a Datacenter

A $599 box on your desk now runs a 26B-parameter language model 24/7, silently, on the power budget of a light bulb. The question isn't whether local AI is fast enough. It's why we ever rented it.

Architecture · · 10 min read

Tools, Skills, Agents: A Composable Stack for Cognition

The agent isn't the smart thing. The agent is the empty room — what makes it useful is what you put inside. Three layers, named carefully, do the work.

Privacy · · 9 min read

The Privacy Premium: What You Pay When AI Is “Free”

Cloud AI doesn't have a price. It has a cost. The cost is the part you don't see on the invoice — your data, your context, your conversation, your model of yourself.

Economics · · 8 min read

The Last Subscription You'll Ever Need

The cloud-AI subscription is a transitional artifact. Once the model fits on your hardware and the runtime is open source, the only honest price for personal intelligence is zero.

Interfaces · · 8 min read

Beyond the Browser Tab: Why Personal AI Lives in Telegram, Not Chrome

The chat tab was a useful prototype, but it was always a prototype. A real personal agent meets you where you already are — in your messenger, your terminal, your editor.

Observability · · 10 min read

Self-Improving Agents: How an AI Watches Its Own Work

An agent that runs forever needs an inner critic. Cashmere has one — a pulse that scores its own behavior, names its own regressions, and routes its own attention.

Memory · · 11 min read

Knowledge Graphs Beat Vector Stores for Personal Memory

Vector embeddings made “long-term memory” tractable for the first generation of LLM apps. They are the wrong primary index for personal AI. The right index is a knowledge graph.

Models · · 9 min read

The Open-Weight Revolution: Why Llama and Gemma Won the Edge

In 2023 the open-weight models were a curiosity. In 2026 they run on every consumer Mac and they're the foundation of the entire local-AI stack.

Essay · · 6 min read

The Death of the Prompt: Why Local LLMs are the Key to True Agentic AI

If we want to move from chatbots to agents — systems that don't just answer questions but actually observe, reason, and act on our behalf — the current cloud-based model is fundamentally broken.

Essay · · 6 min read

The End of Token Anxiety: Why the Future of AI is Local, Proactive, and Private

If you're a power user, you've felt it — the creeping realization that your AI subscription is a mirage. The industry is trapped in a Token Deficit crisis.

Essay · · 8 min read

The Case for Sovereign Intelligence: Why the Future of AI is Local

For the past two years, the narrative around AI has been dominated by scale. But the real revolution isn't bigger models — it's models that run on hardware you own.

Run a sovereign agent on your own hardware.

Cashmere is open source under MIT. No accounts, no telemetry, no upsell. Bring a Mac, install Ollama, and your agent moves in.