TreeDB is a pre-alpha persistent storage engine built around a copy-on-write B+Tree, a persistent value log, and command-WAL recovery. It includes a native wire server, a Mongo-compatible gateway, a collection/document layer, secondary indexes, and benchmark tooling for comparing TreeDB against MongoDB and other engines.
TreeDB is the main focus of this repository. The repo also contains HashDB, an
older mmap-backed hash engine used for experiments and comparison; see
HashDB/README.md.
TreeDB is pre-alpha:
- APIs and on-disk formats may change without backward-compatibility guarantees.
- New binaries may intentionally reject old DB directories.
- Benchmark DB directories should be rebuilt from scratch unless a report says otherwise.
These checked-in reports use different workloads, profiles, and caveats. Treat each workload as scoped evidence from its linked benchmark, not as one combined benchmark suite.
External go-ycsb, local loopback TCP, recordcount=100000,
operationcount=10000, threadcount=16, BSON document format, and zero YCSB
operation errors. Run rows use the median total-throughput repeat from the
latest-main June 3 HST / June 4 UTC report.
| target | profile | load ops/sec | run ops/sec | run avg us | run p99 us |
|---|---|---|---|---|---|
| MongoDB 8 | baseline | 38,755.4 | 26,494.1 | 595.0 | 1,275.0 |
| TreeDB nativewire | command_wal_durable |
83,217.0 | 135,318.6 | 113.0 | 367.0 |
| TreeDB Mongo gateway | command_wal_durable |
68,218.2 | 80,628.4 | 199.0 | 649.0 |
Full report, commands, host context, run repeats, and artifact paths: June 3 latest-main YCSB report.
Two secondary indexes, latest-main June 4 HST / June 4 UTC rerun, 100000
documents, batch size 16000, and command_wal_relaxed for TreeDB. docs/sec
is the timed insert measurement for the value-log outer-leaf layout. Compacted
B/doc uses the byte-minimized exhaustive_compact row for TreeDB template-v1 and
SQLite after VACUUM; TreeDB JSON is omitted from the README compacted-size
headline until the canonical exhaustive fixture covers that format.
| engine / format | layout | docs/sec | compacted B/doc |
|---|---|---|---|
| TreeDB template-v1 | data and index outer leaves in value log | 697,350 | 22.8 |
| TreeDB JSON | data and index outer leaves in value log | 475,511 | — |
| SQLite native columns | WAL normal | 332,116 | 156.7 |
| SQLite JSON | WAL normal | 282,885 | 231.7 |
Source: June 4 exhaustive-compact two-index insert rerun.
Two secondary indexes, April 27 collection/SQLite matrix.
| operation | TreeDB template-v1 ops/sec | TreeDB JSON ops/sec | SQLite native columns ops/sec | SQLite JSON ops/sec |
|---|---|---|---|---|
| Primary read | 771,803 | 524,843 | 357,398 | 473,634 |
| Unique secondary lookup | 815,661 | 845,785 | 484,574 | 442,478 |
| Nonunique secondary lookup | 243,625 | 257,356 | 68,815 | 39,399 |
Source: April 27 collection/SQLite matrix.
The collection insert/read/lookup rows above are single benchmark-driver rows.
For concurrent collection-layer reads and mixed read/write evidence, use the
separate concurrency report. In the June 4 run, TreeDB template-v1 primary reads
into a caller buffer measured 332.3 ns/op at GOMAXPROCS=12; mixed primary
reads with one writer measured 2.96M reader ops/sec and 44.3k writer docs/sec.
Source: June 4 collection concurrency report.
Dated Tier S exact-FP32 no-document snapshot: Apple M3 (darwin/arm64),
2026-06-05, commit 2feb1f0e35459d1b3d044008203d0c8afcf5630f, 10000
documents, 64 dimensions, M=16, efConstruction=128, efSearch=128,
topK=10, query stream length 16, BENCHTIME=1000x, and COUNT=3. TreeDB
rows use warmed persisted column_graph / hnsw_search_pack_v1 no-document
routes. USearch rows are pure in-memory external ANN baselines, not
persistence-equivalent storage rows.
| row | cpu | median ns/op | derived ops/sec | B/op | allocs/op |
|---|---|---|---|---|---|
TreeDB Collection.SearchVectorIndexWithBuffer |
1 | 43,049 | 23,229 | 0 | 0 |
TreeDB OpenVectorIndexSearcher + SearchWithBuffer parallel row |
8 | 8,610 | 116,144 | 0 | 0 |
USearch Search |
1 | 30,065 | 33,261 | 136 | 3 |
USearch SearchParallel |
8 | 6,906 | 144,802 | 139 | 3 |
Source and reproduction workflow:
TreeDB vs USearch vector benchmark workflow.
API chooser, route guardrails, and runnable exact-only demo:
high-QPS collection vector-search guide
and cmd/treedb_vector_highqps_demo.
Gateway-shaped BSON documents, 200000 documents, batch size 1000,
16 insert producers, two secondary indexes (email, city), TreeDB
command_wal_relaxed, settled read state, and 16 concurrent readers for the
read phases. This benchmark keeps TreeDB storage constant while changing the
client/protocol boundary.
| TreeDB access path | load docs/sec | concurrent _id reads/sec |
concurrent indexed email reads/sec |
_id p95 us |
email p95 us |
|---|---|---|---|---|---|
| Direct collection API | 299,777 | 2,026,834 | 230,846 | 6 | 813 |
| Mongo raw wire over TCP | 283,472 | 276,187 | 65,365 | 81 | 627 |
| TreeDB native wire over TCP | 271,948 | 115,712 | 66,917 | 213 | 618 |
| Mongo driver raw command | 276,503 | 124,961 | 70,215 | 197 | 580 |
| Mongo driver command | 253,315 | 130,387 | 69,213 | 191 | 596 |
| Mongo driver CRUD | 228,906 | 115,262 | 62,586 | 221 | 643 |
| Mongo driver unacknowledged writes | 233,169 | 114,026 | 68,454 | 219 | 601 |
Direct collection API is the TreeDB collection/storage ceiling, not a
Mongo-compatible protocol row. Mongo driver unacknowledged writes is not a
durability-equivalent default. Use this table to compare TreeDB access-path
overhead; compare against MongoDB only when client mode and acknowledgement
semantics match.
Source: June 4 fast Mongo/native client-shape matrix.
Local Apple M3 snapshot for 10k x 1536 vectors, topK=10, M=16,
efConstruction=128, and efSearch=128. TreeDB rows use DB-demo no-document
search; USearch is an in-memory library comparator; pgvector is a PostgreSQL
server HNSW comparator.
| system | recall@10 | c=1 avg / QPS | c=8 avg / QPS |
|---|---|---|---|
| TreeDB exact FP32 | 0.9859 | 418 µs / 2,391 | 852 µs / 9,386 |
| TreeDB scalar_u8 rerank32 | 0.9828 | 165 µs / 6,072 | 511 µs / 15,571 |
| USearch f32 HNSW | 0.8938 | 725 µs / 1,380 | 160 µs / 6,259 |
| PostgreSQL+pgvector HNSW | 0.9859 | 2.67 ms / 374 | 4.29 ms / 1,864 |
USearch c=1/c=8 averages are from batch searches with threads=1/8, while
TreeDB and pgvector report per-query latency samples. Source and caveats:
June 8 vector external comparison.
Offline compacted-size comparison from the June 2 Celestia application.db
rerun.
| engine | compacted size | workflow |
|---|---|---|
| TreeDB | 1.690 GiB | command_wal_relaxed, rebuild, CompactStorageFull, offline index vacuum |
| PebbleDB | 2.108 GiB | snappy, 64 KiB blocks, 64 MiB target files, full compact |
| goleveldb | 2.221 GiB | snappy, 64 KiB blocks, restart interval 256, full compact |
Source: June 2 density rerun.
Current-context #2564 benchmark on an active Apple M3 laptop, 256 JSON
documents, scalar indexes on tenant/region, a lexical title/body text
index, and an exact cosine column graph (dims=16, M=8). The insert row times
InsertBatch + Flush + RebuildVectorIndex; search rows build/index the
fixture before timing and then time the search API call only.
| row | timed boundary | ns/op avg | ops/sec | B/op | allocs/op | key counters |
|---|---|---|---|---|---|---|
| Indexed insert/readiness | 256-doc batch insert + flush + vector rebuild | 79,928,531 | 12.5 ops/sec / 3,202.9 docs/sec | 11,014,446 | 113,174 | 174,590 insert ns/doc; 137,618 vector rebuild ns/doc |
| Text candidates | SearchHybridTextCandidates, no docs |
275,850 | 3,625.2 | 425,584 | 7,591 | 64 text candidates; 0 docs fetched; 0 fail/fallback |
| Vector candidates | SearchHybridVectorCandidates, no docs |
20,475 | 48,840.0 | 36,408 | 82 | 64 vector candidates; 0 docs fetched; 0 fail/fallback |
| Hybrid no-doc search | SearchHybrid + rare scalar filter, no docs |
328,450 | 3,044.6 | 514,723 | 7,791 | 64 text + 64 vector candidates; 16 fused; 0 docs fetched |
| Hybrid final fetch | SearchHybrid + rare scalar filter + final topK fetch |
546,853 | 1,828.6 | 572,528 | 8,768 | 10 docs fetched at topK=10; 112 scalar rejections; 0 fail/fallback |
Source, command, artifact paths, host-load caveats, and reproduction workflow: TreeDB indexed insertion/search benchmark.
- Persistent B+Tree index with copy-on-write root publishing.
- Persistent value log for large values and leaf/value placement experiments.
- Command WAL for collection and raw-key redo/recovery.
- Snapshot-isolated readers and exclusive process-level DB directory locking.
- Collection/document APIs with BSON, template-v1, secondary indexes, and vector search experiments.
- Native wire protocol through
cmd/treedb-native-server. - Mongo-compatible gateway through
TreeDB/mongo_gateway. - Benchmark and profiling scripts for YCSB, collections, vector search, and storage-engine comparison.
Build the primary TreeDB servers:
mkdir -p bin
go build -o bin/treedb-native-server ./cmd/treedb-native-server
go build -o bin/treedb-mongo-gateway ./TreeDB/mongo_gateway/server.goRun the native server:
./bin/treedb-native-server \
-dir /tmp/treedb-native \
-profile command_wal_durable \
-addr 127.0.0.1:17130Run the Mongo-compatible gateway:
./bin/treedb-mongo-gateway \
-dir /tmp/treedb-mongo \
-profile command_wal_durable \
-document-format bson \
-addr 127.0.0.1:27130Minimal Go usage:
package main
import (
"fmt"
"log"
treedb "github.com/snissn/gomap/TreeDB"
)
func main() {
opts := treedb.OptionsFor(treedb.ProfileCommandWALDurable, "./my-db")
db, err := treedb.Open(opts)
if err != nil {
log.Fatal(err)
}
defer db.Close()
if err := db.Set([]byte("key"), []byte("value")); err != nil {
log.Fatal(err)
}
value, err := db.Get([]byte("key"))
if err != nil {
log.Fatal(err)
}
fmt.Println(string(value))
}The current public TreeDB profile surface is intentionally small:
command_wal_durable: recommended server default; command WAL enabled with durable sync/checksum settings.command_wal_relaxed: command WAL enabled with relaxed sync/read-integrity settings for high-throughput ingest and comparative benchmarks.bench: explicit no-WAL benchmark-only ceiling.
Legacy/raw profile names are retained only for compatibility and focused low-level tests. They should not be used as public server defaults.
More detail: docs/TREEDB_PROFILES.md and docs/TREEDB_WRITE_PATHS.md.
Current YCSB status and rerun commands:
docs/benchmarks/ycsb_mongodb_treedb_current.mddocs/benchmarks/ycsb_latest_main_2026-06-03.mdscripts/ycsb_compare_mongodb_treedb.sh
Collection, vector-search, and engine benchmark runbooks:
docs/benchmarks/collections_insert_two_index_exhaustive_main_2026-06-04.mddocs/benchmarks/mongo_gateway_fast_client_matrix_2026-06-04.mdTreeDB/docs/guides/vector-search-high-qps-collection-api.mdTreeDB/docs/guides/vector-search-benchmark-workflow.mdcmd/treedb_vector_highqps_demo/README.mddocs/benchmarks/treedb_canonical_benchmark_runbook.mddocs/benchmarks/collections_canonical_benchmark.mdcmd/unified_bench/README.mdcmd/benchprof/README.md
Profile capture workflow:
OUT=$(mktemp -d /tmp/gomap_profiles_XXXXXX)
./bin/unified-bench ... -profile-dir "$OUT"
./bin/benchprof -profiles-dir "$OUT"- TreeDB canonical spec:
TreeDB/docs/spec/README.md - TreeDB guides:
TreeDB/docs/guides/README.md - TreeDB concepts:
docs/TREEDB_CONCEPTS.md - TreeDB storage format:
docs/TREEDB_STORAGE_FORMAT.md - TreeDB recovery:
docs/TREEDB_RECOVERY.md - TreeDB collection quickstart:
docs/TREEDB_COLLECTION_QUICKSTART.md - Contracts:
docs/contracts/README.md - Full docs index:
docs/README.md
TreeDB/: TreeDB storage engine, collection layer, native APIs, command WAL, and Mongo gateway.cmd/treedb-native-server/: native wire server.TreeDB/mongo_gateway/: Mongo-compatible TreeDB gateway.cmd/unified_bench/: cross-engine benchmark harness.cmd/benchprof/: profile/result summarizer.HashDB/: mmap-backed hash-index engine used for experiments and comparison.
go test ./...
go test ./TreeDB/... ./cmd/treedb-native-server ./TreeDB/mongo_gatewayFor large benchmark runs, prefer a fresh DB directory and record the exact commit, host, profile, command, and artifact path in the report.