OptiMap

Status: Experiment

A Rust library providing a matrix of SIMD-accelerated hash map implementations with different performance trade-offs, all benchmarked against hashbrown (Rust's std HashMap) and std::BTreeMap.

Individual designs are composed from a small set of orthogonal parameters — tag strategy, overflow strategy, group width, indexing method, and deletion model. Together they explore a design space: tombstone-free O(1) miss termination vs wider tags for fewer false matches; 8-bit overflow channels vs 1-bit vs embedded; 15-slot vs 16-slot vs 32-slot vs 64-slot groups; shift-based vs AND-based group indexing.

The library provides a single entrypoint OptiMap wrapper type as well. It is a smart wrapper around the individual hash map implementations, providing a consistent interface and policy engine for backend selection. There is some measured overhead from the dispatch logic due to the enum variant dispatch overhead, but it's small and usually easy for branch prediction to mitigate.

On AI

This library is heavily AI-generated. The vast majority of container code is pretty formulaic so it should be fine. The exception is the hot path for the core operations.

High-level design and access strategies were all human-written. The core SIMD-accelerated paths were also reviewed/tweaked/optimized by hand as well, though I'm sure I've missed some opportunities for further SIMD optimization.

Correctness is ensured through fuzzing, benchmarks, and extensive testing.

OptiMap: the smart wrapper

OptiMap<K, V> wraps all hash backends (plus FlatBTree) behind a 7-variant enum:

Ufm | Splitsies | Ipo | Tomb | Gaps | Ipo64 | FlatBTree

Every operation dispatches through a match — this is the only overhead vs using a raw type directly. The dispatch is ~1–2 ns (a single predicted branch after the inner method inlines). The OptiMapBench* wrappers in the benchmark harness measure pinned backends through this dispatch so the overhead is visible in benchmarks.

Policy engine: OptiMap::new() uses Policy::auto(), which picks MapType::Tomb (HighTag128_TombMap) as a single-band default — currently the best all-rounder per sweep-2026-05-13 data. Users can configure multi-band policies that transition backends on resize:

use optimap::{OptiMap, Policy, MapType};

let policy = Policy::new()
    .band(0, MapType::Splitsies)
    .band(1024, MapType::Tomb);
let mut map = OptiMap::<u64, u64>::with_policy(policy);

Pinned constructors (OptiMap::ipo(), OptiMap::tomb(), etc.) never transition. Sorted containers go through OptiMap::flat_btree() or Hint::Sorted.

The HashedMap trait covers the full std::HashMap interface for all hash backends, and SortedMap covers ordered operations (first/last, range, pop_first/pop_last). OptiSet<T> wraps OptiMap<T, ()> with set-specific API.

Usage

use optimap::{OptiMap, HashedMap};

// Let the policy choose (defaults to HighTag128_TombMap)
let mut map = OptiMap::new();
map.insert("key", 42);
assert_eq!(map.get(&"key"), Some(&42));

// Pin a specific backend
let mut map = OptiMap::ipo();       // InPlaceOverflow (254-value tag)
let mut map = OptiMap::tomb();      // HighTag128_TombMap (128-value tag)
let mut map = OptiMap::splitsies(); // Splitsies
let mut map = OptiMap::flat_btree();// FlatBTree (sorted)

// Workload hints
use optimap::Hint;
let mut map = OptiMap::<u64, u64>::with_hint(Hint::Churn);

// Raw types (no dispatch overhead)
use optimap::{UnorderedFlatMap, Splitsies, InPlaceOverflow, Gaps, SoaMap, FlatBTree};
let mut map = Splitsies::new();
map.insert(1, "one");

Named designs

These are the "brand" types exported at the crate root. They are representative of each design family and used in the default benchmark tables.

Yes the naming is a mess, I'm sorry.

HighTag128_TombMap tends to be the best overall performing and is used by default under the name "Tomb". It is the exact same design as hashbrown/swisstables.

Design	Family	Groups	Deletion	Best at
HighTag128_TombMap	Tombstone	16-slot	Tombstone (128-value tag)	Hit + remove (default)
InPlaceOverflow	Tombstone	16-slot	Tombstone (254-value tag)	Lookup hit, insert
UnorderedFlatMap	Overflow-bit	15-slot	Tombstone-free (embedded)	High-load miss, churn
Splitsies	Overflow-bit	16-slot	Tombstone-free (separate)	Balanced miss+insert
IPO64	Tombstone	64-slot (AVX-512)	Tombstone (254-value tag)	High-load resilience
Gaps	Overflow-bit	15-slot + power-of-2 stride	Tombstone-free (embedded)	Iteration
SoaMap	Overflow-bit	16-slot SoA	Tombstone-free	Large-value workloads
FlatBTree	Tree	256-byte B+ tree nodes	—	Sorted iteration, range queries

What the names mean

Names encode the parameterization:

<Region><Kind><Width> — Region is Low (tag from the bottom of the hash) or High (from the top); Kind is Tag (overflow-bit family) or Tomb (tombstone family, implied 254 values); Width is the distinct-tag-value count. HighTomb = top byte, 254 values (1/254 false-match rate). HighTag128 = top byte, 128 values (hashbrown-equivalent; 1/128 rate but a one-instruction tag extraction). LowTomb = bottom byte, 254 values (used by IPO64 with shift indexing).
Overflow strategy — how displaced entries are tracked:
- _8bit = 8 independent overflow channels (ByteSeparate)
- _1bit = single shared overflow bit (BitSeparate)
- _Emb = embedded overflow byte (UFM/Gaps style, byte 15 of the group)
- _EmbP2 = embedded overflow with power-of-2 bucket stride (Gaps style)
Indexing — how the hash maps to a group:
- (no suffix) = shift-based (h >> shift), uses top hash bits
- And suffix = AND-based (h & mask), uses bottom hash bits — 1 instruction vs 2; requires tags from top bits (HighTag*)
Width — SIMD group slot count:
- (no suffix) = 16-slot (SSE2)
- 32 suffix = 32-slot (AVX2), 64 suffix = 64-slot (AVX-512)
ChSafe suffix (e.g. HighTag128ChSafe) = tag ⊥ channel ⊥ group index — overflow channels sourced from bits disjoint from both the tag and the AND index, needed for 8-bit overflow under AND indexing (and, on the low side, for tag↔channel decorrelation under shift indexing).
Tomb suffix = tombstone-family table (IPO/IPO64 family, TombstoneTag trait instead of the overflow-bit TagStrategy).

The design matrix (parameter space)

Designs are not hand-coded — they're composed from traits:

Axis	Trait	Implementations
Tag extraction	`TagStrategy` / `TombstoneTag`	`LowTag255`, `LowTag128`, `LowTomb`, `LowTag255ChSafe`, `HighTag128`, `HighTag255`, `HighTomb`, `HighTag128ChSafe`, `HighTag255ChSafe`
Overflow storage	`OverflowStrategy`	`ByteSeparate` (8-channel), `BitSeparate` (1-bit), `UfmEmbedded` (byte 15)
Group indexing	`GroupLayout::GROUP_INDEX_REGION`	`HashRegion::High` = shift-based (`h >> shift`, default) or `HashRegion::Low` = AND-based (`h & mask`)
Group width	`GroupOps`	15-slot, 16-slot, 32-slot, 64-slot
Load factor	`GroupLayout::LOAD_FACTOR_NUM/DEN`	Default 7/8 (87.5%), customizable

New variants are ~30 lines: a type alias composing these traits. The matrix_types module in src/lib.rs holds experimental combinations (~70 type aliases). The sweep benchmark's --design-set all exercises every one of them.

Default benchmark tables

The project uses two tiers of benchmark tables:

Sweep benchmarks (cargo bench --bench sweep) — continuous N-curves from 100 to 10M entries, measuring per-operation latency as N grows. The default --design-set curated picks 12 representative designs: one or two best-of-cluster per design family, plus hashbrown and OptiMap-Auto. This is the authoritative data for performance comparisons across the full size range.

Throughput benchmarks (cargo bench --bench throughput) — fixed-size (13K and 107K entries, 70% load) single-operation latency using Criterion. Runs all named designs plus matrix variants (~20 rows), giving precise microbenchmarks at two sizes.

Performance claims in this README reference sweep data at representative N values, not a single cherry-picked size.

Key architectural properties

foldhash by default (avalanching, fast)
Overflow-bit designs (UFM, Splitsies, Gaps) are tombstone-free — deletion clears the slot and adjusts max_load, giving O(1) miss termination
Tombstone designs (IPO, IPO64, Tomb) use tombstones like hashbrown, with EMPTY-based probe termination — faster hit/insert when cache-resident, but lookup_miss degrades at DRAM scale (>1M entries) as tombstones accumulate
Two memory layouts: mid-pointer (ctrl between buckets and metadata, eliminates one address computation from the hot path) used by tombstone and 16-slot overflow-bit designs; traditional layout for embedded-overflow designs (UFM, Gaps) where overflow lives inside the group
70% default load factor (87.5% of slots filled before resize)
Raw entry API for interning and custom-equality lookups

Building

# Requires Rust nightly (SIMD intrinsics) + Nix flake (direnv auto-activates)
cargo test
cargo bench

# Sweep benchmarks (N-curves → CSV + PNG plots)
./scripts/sweep-bench.sh

License

AGPL-3.0-or-later, as discussed in Moral AI Licensing.

Name		Name	Last commit message	Last commit date
Latest commit History 289 Commits
benches		benches
docs		docs
examples		examples
fuzz		fuzz
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
Cargo.toml		Cargo.toml
LICENSE.txt		LICENSE.txt
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OptiMap

On AI

OptiMap: the smart wrapper

Usage

Named designs

What the names mean

The design matrix (parameter space)

Default benchmark tables

Key architectural properties

Building

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OptiMap

On AI

OptiMap: the smart wrapper

Usage

Named designs

What the names mean

The design matrix (parameter space)

Default benchmark tables

Key architectural properties

Building

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages