Your second brain — and it stays yours. Forever. Local.
A sovereign personal knowledge system. The conversations, decisions, documents, and half-formed thoughts you produce across Claude, ChatGPT, Gemini, Copilot — and the files you accumulate in real life — all come home to one place that outlives any vendor and obeys only you.
When you think hard today, you often think with an LLM in the loop. School, work, authorities, court cases, taxes, family, health, relationships — these conversations contain your most private thinking. More intimate than any diary.
And then they evaporate:
- Your subscription lapses or you switch providers → history gone
- The provider retires a model or rewrites their ToS → answers no longer reproducible
- An account ban, a provider going under, a country blocking the service → everything lost
- The data sits on a vendor's servers, fed into training, served on subpoena, exposed in the next breach
HiveMem is built around the opposite stance:
- Sovereignty — Your data lives in your instance. Postgres + SeaweedFS, on hardware you control. No vendor sees the contents unless you explicitly route a single LLM call through them.
- Persistence — Everything is append-only with
valid_from/valid_until. No subscription change can revoke access. No retention policy you didn't author can delete what's yours. - Portability — A HiveMem instance packs into one encrypted archive (Postgres dump + binary store + config) and restores anywhere. Vendor lock-in: zero.
- Aggregation — What you write in Claude.ai, ChatGPT, Gemini, Claude Code, Copilot lands in HiveMem too. Those tools become front-ends; HiveMem holds the truth.
- Privacy by realm — Strict separation per life area
(
legal,medical,private,work). Per-realm routing rules: anything touching authorities or health stays on local models, never reaches a cloud provider.
The long-term goal is a periodic agent — the Queen — that wakes on a schedule, surveys your knowledge, and dispatches specialized worker agents (Bees) to flag isolated cells, stale facts, duplicate candidates, and realms drifting from their blueprint. Everything risky stays a proposal that flows through the existing approval workflow; you keep the kill switch.
Today the Queen and the isolated-cell Bee already run on the Vistierie agent
runtime — scheduled (cron), dispatched with per-run cost accounting and a
per-tenant kill switch — and their proposals flow through the approval workflow
as pending tunnels. An admin-only Queen log UI (/queen) shows run history,
event timelines, and the proposal queue. Still to come: a conversation UI that
teaches the Queen your per-realm preferences, and further Bee types (stale-fact,
duplicate-cell, blueprint-drift).
→ Roadmap — what's planned, what's partial, and the order of work.
→ Scientific foundations — the cognitive-science and PKM theory HiveMem's design is built on (Working Memory, Cognitive Load, Extended Mind, Forgetting Curve, Zettelkasten, PARA).
Docker images: ghcr.io/visterion/hivemem:main for the rolling main branch, plus semver tags such as ghcr.io/visterion/hivemem:9.1.5 for cut releases.
- 6-Signal Ranked Search — Semantic similarity, keyword, recency, importance, popularity, and graph proximity — combined into one ranked result.
- Temporal Knowledge Graph — Facts with
valid_from/valid_untiland multi-hop graph traversal; query the graph as it stood at any point in time. - Progressive Summarization — Four layers per cell: content, summary, key points, and insight. Never lose nuance.
- Document & Scan Pipeline — the end-to-end picture of how any file becomes searchable knowledge: two entry points (watched folder + REST upload), one shared ingest core (hash → parse → dedup → store → cell), and four async enrichment paths (OCR · Vision · Kroki · Summarizer). The map that ties the doc features below together.
- Long cells stay searchable — auto-summarizer turns multi-page documents into curated summaries that are embedded for semantic search; cost-capped, opt-in.
- Scanned PDFs become searchable — Tesseract OCR extracts text from scan-only PDFs; combined with the auto-summarizer, even paper-mailed documents are findable by semantic search.
- Consumption folder — auto document separation — Drop a stack of mixed scans into a network folder (SMB); HiveMem OCRs each page and uses a Vistierie LLM agent to split multi-page batches into individual documents by content — no separator or barcode sheets. The USP over Paperless-ngx; live in production.
- Document-Type Extraction — invoices, contracts, and other typed documents are auto-classified during summarization; typed facts (vendor, amount, parties, dates) land in the knowledge graph.
- Kroki + Vision — Diagram thumbnails (Mermaid/PlantUML/Graphviz/D2) and image description via Claude Haiku — async, opt-in, budget-capped.
- Append-Only Versioning + Time Machine — No data is ever deleted. Query your knowledge at any point in time.
- Agent Fleet + Approval Workflow — Agents write pending suggestions; only admins approve. Every write is human-gated.
- Auto-Inject Hook for Claude Code — Relevant memories injected into every session automatically, before you even ask.
- Full instance portability — Export the entire HiveMem instance (Postgres + attachments + identity) into one tar.gz, restore it on another host with one command. Mission promise made provable.
- Bilingual UI (German/English, German-first) with a backend-configured default language.
Honest snapshot of what is shipping today versus what the surrounding prose describes as the long-term shape. See the roadmap for details on every 🟡 / 🔴 row.
| Feature | Status | Notes |
|---|---|---|
| 6-Signal Ranked Search | ✅ Stable | semantic + keyword + recency + importance + popularity + graph proximity, all wired into one SQL ranker |
| Progressive Summarization | ✅ Stable | content / summary / key points / insight, all four populated automatically |
| Auto-Summarizer for long cells | ✅ Stable | summary is embedded for semantic search, cost-capped per realm |
| OCR for scanned PDFs | ✅ Stable | Tesseract, async backfill, Vision fallback |
| Document-Type Extraction | ✅ Stable | invoices/contracts/etc → typed facts in the knowledge graph |
| Kroki + Vision | ✅ Stable | diagram thumbnails + Claude Haiku image description, opt-in, budget-capped |
| Append-Only Versioning + Time Machine | ✅ Stable | time_machine queries by event time and ingestion time |
| Agent Approval Workflow | ✅ Stable | every agent write lands as pending until an admin approves |
| Auto-Inject Hook (Claude Code) | ✅ Stable | 6-stage filter pipeline, Bearer-token auth |
| Full Instance Portability | ✅ Stable | one-command tar.gz of Postgres + attachments + identity |
| OAuth Custom Connector | ✅ Stable | RFC 8414 / 9728 discovery, PKCE |
| Temporal Knowledge Graph | 🟡 Partial | bi-temporal facts and multi-hop traversal ship; automatic contradiction detection is not yet implemented |
| Privacy by Realm — model routing | 🟡 Partial | data segregation by realm works; per-realm enforcement of "stays on local models" is not yet wired into the LLM call path |
| Queen + Bees periodic agent | 🟡 Partial | Queen + isolated-cell-Bee run on Vistierie's agent runtime (cron, subagent dispatch, run/cost audit, kill switch); proposals land as pending tunnels via the approval workflow. An admin-only Queen-log UI (/queen) shows runs + event timelines and the proposal approval queue. Still missing: preference UI, further Bee types. |
| Consumption folder — auto document separation | ✅ Stable | Drop a stack of mixed scans into a network folder; HiveMem ingests off a bounded worker pool, OCRs each page (auto-oriented), and uses a Vistierie LLM agent to split by content — no separator/barcode sheets. High-confidence splits → committed, low-confidence → pending. The HiveMem→Vistierie run contract is reconciled; live in production. Reassembly of non-contiguous/shuffled pages is a separate roadmap item. |
| Vision | Cognitive-science and PKM foundations behind HiveMem's design |
| Getting Started | Prerequisites, embedding service, token creation, connect to Claude |
| The Structure | Realms, signals, topics, cells, tunnels — the knowledge hierarchy |
| Architecture | System diagram, data model, security matrix |
| Tools | All 46 MCP tools, the parallel REST attachment API, search signals, progressive summarization |
| Authentication | Roles, token management, security details |
| OAuth + Custom Connector | Add HiveMem as a Claude.ai/ChatGPT Custom Connector |
| Backup + Portability | Export and restore entire instances, disaster recovery, cloning |
| Hook Integration | Auto-inject context into Claude Code sessions |
| Operations | Deployment, migrations, debugging |
| Roadmap | What's planned, what's partial, order of work |
| Document & Scan Pipeline | End-to-end overview: entry points, shared ingest core, the four enrichment paths |
| Consumption Folder | Scan-to-folder ingest, automatic content-based document separation, config reference |
HiveMem is fair-code licensed under the Sustainable Use License. Free for personal and internal business use. See LICENSING.md for details.