Architecture · 10 min read

Tools, Skills, Agents: A Composable Stack for Cognition

The agent isn't the smart thing. The agent is the empty room — what makes it useful is what you put inside. Three layers, named carefully, do the work.

AGENTS router · retriever · compiler · responder SKILLS summarize · translate · research_brief · … TOOLS memory · web · fs · skills · MCP proxies

"Build an agent" is bad advice. The interesting question is what's inside the agent. An empty agent is a chat loop with a system prompt. A capable agent is the same loop with a stack of layers underneath it. Cashmere's stack has three: tools, skills, and agents. They look similar from the outside. They are not the same.

Get this layering wrong and you end up with one of two failure modes: a single god-agent with a thousand-tool prompt that picks the wrong one half the time, or a maze of micro-agents that hand work to each other faster than any of them can do it. Get it right and the agent becomes legible — you can read its trace and understand what it did and why.

Layer 1: Tools

A tool is a function. That's the entire definition. It has a name, a typed parameter schema, and an execute() implementation. The model picks one, the harness runs it, the result comes back.

Tools are the primitive operations the agent can perform on the world. read_file. web_search. memory_search. list_briefings. They should be small, sharp, and obvious. A tool with three modes and a discriminator argument is two tools that haven't been split yet.

The hard discipline at this layer is the error contract. A tool that raises an exception inside the harness loop crashes the agent's reasoning. A tool that returns a descriptive error string lets the model see the problem and adjust on the next iteration. Cashmere's rule: every tool's execute() returns a string, including in failure cases. "User-input errors" return an explanation. "Resource not found" returns a description. Only genuine infrastructure failures with no recovery path are allowed to raise.

class ListBriefingsTool(Tool):
    @property
    def definition(self) -> ToolDefinition:
        return ToolDefinition(
            name="list_briefings",
            description="Read recent daemon-generated briefings",
            parameters=[
                ToolParameter("match", "string", required=False),
                ToolParameter("briefing_type", "string", required=False),
                ToolParameter("limit", "integer", required=False, default=20),
            ],
        )

    async def execute(self, **kwargs) -> str:
        try:
            limit = min(int(kwargs.get("limit", 20)), 100)
            rows = await db.list_briefings(...)
            return format_briefings(rows)
        except ValueError as e:
            return f"Error: invalid limit ({e})"

That's the entire shape of a tool. ~30 lines. The hardest part is the description — the sentence the model reads when deciding whether to call it. A tool with a fuzzy description is a tool the model gets wrong half the time.

Layer 2: Skills

A skill is one notch up. A skill is a tool whose implementation is "another LLM call, with its own system prompt and its own tool subset." It's a unit of delegated cognition.

Cashmere ships skills as markdown files with YAML frontmatter:

---
name: research_brief
description: Web-research a topic and produce a brief with citations
tools: [web_search, web_fetch]
model: deep
max_iterations: 8
arguments:
  - name: topic
    type: string
    required: true
---

You are a thorough research analyst. When given a topic, search
the web, follow up on the most relevant sources, and produce
a structured brief with citations…

The frontmatter is the skill's contract — its name, its tools, its model alias, its argument schema. The body is the system prompt. Drop the file in skills/, hit POST /api/v1/skills/reload, and the skill is live. No code change. No restart.

The point of the skill abstraction is that any agent can use a skill via a single tool — delegate_skill. The calling agent doesn't have to know whether "research_brief" is a tool, a skill, an external service, or a different model entirely. It calls the skill, gets a string back, keeps reasoning. This is the same modularity OS shells get from $PATH.

Layer 3: Agents

An agent is a typed loop with a defined role. Cashmere has four built-in: router (decides what kind of question this is), retriever (pulls relevant context), compiler (assembles the prompt), responder (writes the user-visible reply). Each is a BaseAgent subclass with a model alias, a Jinja system prompt template, a max_iterations budget, and an explicit list of allowed tool names.

Notice the constraint: an agent doesn't have access to every tool. It has a curated subset. A router doesn't get filesystem write access. A retriever doesn't get tools that mutate state. The blast radius of any single agent is the union of its tools — and you can audit that list in one glance.

Layer Unit of work Authored as Hot-swap?
Tool A single deterministic function Python class No (restart)
Skill A delegated LLM call with its own prompt Markdown + YAML Yes (reload endpoint)
Agent A multi-iteration reasoning loop with a role Python class + Jinja prompt No (restart)

Why three layers, not one or five?

There's a temptation to make this simpler — "everything is just a tool" — and a counter-temptation to make it more elaborate — "tools, skills, agents, workflows, projects, plans…" Both are wrong, and the reasons are worth being explicit about.

Collapsing tools and skills into one layer loses the "hot-swappable, prompt-defined behavior" property. A skill is a thing a domain expert can author in a text editor and reload at runtime. If you forced every skill to be a Python class, you've made the agent's behavior changeable only by people who can edit Python and restart the server. The whole point is that knowledge work is editable.

Adding more layers — workflows, plans, programs, etc. — usually fails for the opposite reason. Each new layer needs a clear semantic distinction from its neighbors, and beyond three the distinctions become arbitrary. A "workflow" that's just a sequence of skill calls is better expressed as a skill that calls delegate_skill in a loop. Don't add a layer to model something you can already model with the layers you have.

The MCP twist

The most important thing this stack does is make MCP integration almost free. The Model Context Protocol is an open standard for tool-providing servers. Cashmere connects to MCP servers as a client, registers each remote tool as a proxy Tool in the same registry, and from that point on every agent and every skill can use them as if they were native. Filesystem from mcp-server-filesystem? A tool. GitHub from the official server? A tool. Slack? A tool. They all show up in the same list, get picked by the same router, and stream their results back into the same memory store.

Tools that are uniformly typed are tools that compose. The MCP win isn't that you got more capabilities — it's that you got more capabilities without changing the layering.

Build the layers once. Add capability without rebuilding. That's what "composable" actually means.

What this looks like in practice

A user asks: "Summarize the Cashmere repo's README and check if there are any open issues touching the daemon."

The router agent classifies the question as a multi-step research task. The retriever pulls in relevant memories ("user is the Cashmere founder," "daemon is the long-running process"). The compiler assembles a prompt with that context plus a tool palette. The responder agent runs the loop: it calls the GitHub MCP tool to fetch the README, calls delegate_skill to invoke summarize on the contents, calls the GitHub tool again to list issues filtered by the daemon label, and produces a final response.

Four agents, two skills, three tools, eight LLM calls. Total trace length ~12 seconds on local hardware. The user sees: one paragraph and a bulleted list. Underneath: a clean stack, doing exactly what it's meant to do.


Inside Cashmere: tools live in cashmere/tools/, skills in skills/, agents in cashmere/agents/. The MCP client manager is at cashmere/mcp/client.py. The skill loader auto-reloads on a single endpoint hit so authors can iterate without restarting.