Knowledge Graphs Beat Vector Stores for Personal Memory
Vector embeddings made “long-term memory” tractable for the first generation of LLM apps. They are the wrong primary index for personal AI. The right index is a knowledge graph — and the right system uses both.
"Vector store" became shorthand for "AI memory" sometime in 2023, and the phrase has been doing quiet damage ever since. For a first-generation chatbot remembering a long conversation, vectors work fine. For a personal agent that needs to know the people in your life, the projects you've worked on, the topics you've researched, and the connections between them — vectors fall over, and they fall over in ways that look like the system is "just being dumb today."
The fix isn't to throw out vectors. It's to demote them from the primary index to one signal among several, and to build the actual primary index out of entities and edges — a knowledge graph.
What vector stores are good at
Vector embeddings catch semantic similarity. Two sentences that mean similar things land near each other in the embedding space. The retrieval algorithm — top-K nearest neighbors by cosine — is fast, simple, and survives paraphrase. Ask "what did I say about pricing?" and a vector store will surface conversations where you said "I'm worried about the cost" even if you never used the word "pricing."
That's a real win. It's also a small one.
What vector stores are bad at
Vector retrieval has at least four structural weaknesses that show up exactly when personal-agent use cases get interesting:
- No notion of identity. The string "Jamie" and the string "Jamie Kim, the analyst who reviewed the Q3 deck" embed differently. The vector store treats them as separate facts. They are the same person.
- No notion of relationship. Knowing that Jamie works at Acme and that Acme is a portfolio company doesn't surface together unless both happen to be in the chunk you're retrieving from. The graph version of this is one edge.
- Recency collapse. Six chunks talking about "the OPEC project" all embed to the same neighborhood. The retrieval has no native way to prefer the most recent unless you bolt on a separate scoring layer.
- Aggregation impossibility. "How many times has Jamie been mentioned in the last month?" is a trivial graph query and a near-impossible vector query. Vectors can find similar things; they can't count things.
Returns 5 chunks containing the substring "OPEC."
Misses: 4 chunks about "the oil watch" (different name, same entity).
Misses: an email from Jamie referencing the same watch (no keyword overlap).
Misses: 3 watch-alert briefings tagged with the watch ID (FK relation, invisible to embeddings).
Resolve "OPEC watch" → entity watch:opec via name index.
Walk edges: 12 briefings (FK), 4 emails (mentioned), 7 chats (referenced).
Vector-search the residue for paraphrase ("oil watch"): +3 chunks.
Rank by recency, surface the top 8.
What a knowledge graph adds
A knowledge graph models the world as entities (typed nodes — people, projects, topics, places, events) and relationships (typed edges — "mentioned in," "owned by," "occurred during"). That sounds heavyweight; in practice it's two SQLite tables and an extraction step.
The killer feature is that retrieval becomes graph traversal. Ask about an entity, walk the edges, surface everything connected. The system doesn't need to guess at semantic similarity when it has a literal pointer.
Cashmere's entity extraction runs as part of the daemon loop. After every chat, after every document ingest, an extraction pass identifies the entities mentioned and adds them as nodes and edges in the graph. Stable entities (the user's main project, recurring collaborators) accumulate density. Transient ones (a one-off mention) stay sparse and decay.
Why "either/or" is the wrong frame
The most common reaction to "vectors aren't enough" is "okay, then we use a graph instead." That's also wrong. The graph and the vectors are good at different things and the right system uses both.
A graph is great when you know the entity. It's bad when you don't. If a user says "what was that thing about distillation?" the graph can't help unless "distillation" already exists as an entity node. The vector store can — it'll surface chunks that talk about model distillation even if the user never named the topic exactly.
Cashmere's retrieval is therefore three signals, blended by a small reranker:
- Knowledge graph. Resolves named entities, walks relationships, surfaces structurally-connected memories.
- Vector embeddings. Catches paraphrase and topic-similar content the entity layer doesn't have edges for.
- Full-text search (FTS5). Anchors on exact-string matches that embeddings sometimes float past — file paths, identifiers, quoted phrases.
The reranker is a few hundred lines of Python that takes the top results from each signal, deduplicates, scores by recency + importance + signal-overlap, and returns the top N. It's not sophisticated; it doesn't need to be. The signals do most of the work.
What this gives the user
The downstream effect is invisible if you've never seen it work, and obvious the moment you do. Three behaviors that vector-only systems can't reliably produce:
- "Who have I talked to about this?" The graph knows. It walks from the topic node to the person nodes, returns the list, sorted by recency.
- "What's the latest on Jamie's project?" The graph resolves Jamie → projects → most recent activity. Vectors would surface the most embedding-similar chunks, which often means the longest ones, not the most recent.
- "Have I changed my mind about X?" The graph can find every memory tagged with X and rank them chronologically. Diffing the early ones against the late ones produces an actual "you used to think A, now you think B" answer. Vectors can't do this — every chunk is a static embedding.
The cost question
People resist graphs because "extracting entities is expensive." That was true when extraction meant calling a frontier model on every paragraph. With a local model running 24/7, extraction is free at the margin — it's just one more thing the daemon does between user requests.
The economics flip the moment inference is local. Vector-only stores existed because, in the cloud era, the cheapest possible memory layer was "embed once, query later." Once the model runs on your hardware, the cheapest possible memory layer is "let the agent reason about every new piece of input as it lands." Graphs are what you get when you can afford to think.
Vectors find similar things. Graphs find connected things. A personal agent needs both, and uses each for what it's good at.
Look at the schema, not the marketing
A simple test for any "AI memory" system: ask what the schema looks like. If the answer is "embeddings in Pinecone," you have a vector store with marketing on top. If the answer is "typed entities, typed relationships, plus an embedding column for fuzzy fallback," you have a knowledge graph with vectors as a co-signal. The second one will know your life. The first one will keep finding similar-sounding chunks.
Build the graph. The vectors are a feature, not the substrate.
Inside Cashmere: the schema lives in cashmere/migrations/. The
extraction pipeline is in cashmere/memory/extraction.py. The retrieval blender is
in cashmere/memory/retrieval.py. SQLite + sqlite-vec + FTS5 — all on disk, all yours.