Resplix CDC Research Cluster

Data is an Asset. Movement is a Liability.

This repository contains high-performance implementations of Content-Defined Chunking (CDC) algorithms, specifically focusing on the FastCDC (2016) paper: FastCDC: a Fast and Efficient Content-Defined Chunking Approach for Data Deduplication. Engineered for industrial-scale data deduplication and verifiable movement within the Resplix ecosystem.

/// Architecture Manifestos

The Gear Hash Advantage

Unlike traditional Rabin Fingerprinting, FastCDC utilizes the Gear Hash algorithm. Gear Hash is designed for SIMD acceleration, allowing us to process multiple window offsets in parallel. This eliminates the CPU bottleneck often associated with CDC, pushing throughput towards hardware limits.

FastCDC (2016) Key Innovations

Gear Hash: A simplified rolling hash that utilizes a pre-computed lookup table to map byte values to random 64-bit integers.
Normalized Chunk Distribution: Solving the "chunk size variance" problem by using a dual-threshold mask to keep chunk sizes within a predictable, optimal range.
Cut-point Skipping: Accelerating the scanning process by skipping a minimum distance after each chunk boundary.
Content-Addressing: Leveraging BLAKE3 for cryptographically secure chunk identification, enabling verifiable data movement.
Streaming Support: Implementations for io.Reader (Go) and Read (Rust) allowing for zero-copy, memory-efficient chunking of multi-terabyte streams.

/// Performance Targets

Metric	Target	Status
Hash Throughput	10GB/s+ per core	Optimizing (Core Logic Done)
Dedupe Efficiency	> 90% (Standard Datasets)	Validating
Indexing Latency	< 0.002ms	Engineering Beta
CPU Overhead	< 1.2% Under Load	Benchmarking

/// Cross-Platform CI Benchmarks

We continuously verify our FastCDC SIMD implementations on GitHub Actions runners. Below are the median throughputs for raw Gear Hashing and the full FastCDC chunking pipeline (CDC cut-point search + per-chunk BLAKE3, default ~16KB avg chunks) across different architectures:

Architecture	CPU / Instance	SIMD Target	Raw Hash (GiB/s)	Pipeline (MiB/s)
Linux ARM64	Ampere Altra (Neoverse N1)	`NEON`	28.18	~963
macOS ARM64	Apple M1	`NEON`	12.30	~844
Linux x64	AMD EPYC 7763 (Zen 3)	`Scalar`*	10.97	~1125
Windows x64	AMD EPYC 7763 (Zen 3)	`Scalar`*	10.91	~1105

Note on x86_64: On AMD Zen 3 architectures, the scalar implementation heavily outperforms AVX2. This is a known micro-architectural quirk where Zen 3 implements the vpgatherqq (AVX2 gather) instruction using dozens of micro-ops, whereas our aggressively unrolled scalar loop is perfectly pipelined. On Intel architectures (Skylake/Ice Lake), AVX2 gather is fully hardware-accelerated.

Note on ARM64: Our NEON implementation uses a 4-byte unrolled loop with bounds-check elimination and vshl/vadd intrinsics. This yields a massive +58% throughput increase over the scalar path on Neoverse N1 processors!

/// Project Structure

.
├── go/             # Go (Golang) implementation with SIMD/Assembler optimizations
└── rust/           # Rust implementation utilizing AVX-512 / NEON intrinsics

/// Getting Started

Rust Implementation

cd rust
cargo bench

Go Implementation

cd go
go test -v ./...

/// Technical Deep-Dive

For a detailed exploration of the mathematics behind our implementation, see the Resplix Architectural Manifestos.

SIMD Acceleration: Leveraging AVX-512 and NEON to saturate 10GbE links.
Atomic Resumability: Cryptographic proof of data parity, ensuring zero restarts from zero.
Xor-Filter Indexing: Sub-microsecond shard discovery with zero false-positives for static data sets.

The 64-Bit Gear Table

Why 64 Bits?

The Gear Table consists of 256 entries, each 64 bits wide. This is not accidental—it's a deliberate optimization across three dimensions: hardware architecture, collision resistance, and cache efficiency.

1. Native CPU Word Size

Modern CPUs are 64-bit architectures. All general-purpose registers (RAX, RBX, RCX, etc.) operate natively on 64-bit values. Using 64-bit integers means:

XOR, SHIFT, AND operations execute in 1 cycle
No masking or splitting across multiple registers
Full utilization of the ALU (Arithmetic Logic Unit)

; Single instruction for the entire rolling hash update
xor rax, qword ptr [gear_table + rdx*8]

2. Cache Line Alignment

CPU cache lines are 64 bytes wide. Since each table entry is 8 bytes:


Cache line (64 bytes) = 8 × (64-bit gear entries)

This means:

-   Loading one entry pulls 7 neighboring entries into cache for free
   
-   The entire 2KB table occupies exactly 32 cache lines
   
-   Sequential byte access maintains near-100% cache hit rates

3. Collision Resistance (The Birthday Paradox)

Bit Width	Possible Values	50% Collision Probability After
32-bit	4.29 billion	~77,000 chunks
64-bit	18.4 quintillion	~5 billion chunks
128-bit	3.4×10³⁸	Impractical to compute

For a system processing petabytes of data, 64-bit provides mathematical certainty that:

Two different data patterns will not produce the same rolling hash
Chunk boundaries are deterministic and reproducible
No false deduplication matches

4. Memory Footprint vs. Coverage

Table Size	Memory	Input Coverage	L1 Fit?	L2 Fit?
256×8 bytes	2KB	256 values (1 byte)	✅	✅
256×4 bytes	1KB	256 values	✅	✅
256×16 bytes	4KB	256 values	⚠️	✅
65536×8 bytes	512KB	65536 values (2 bytes)	❌	⚠️

The 64-bit choice maximizes entropy per byte while maintaining optimal cache residency.

5. Practical Throughput Impact

With 64-bit gear table:

L1 hit: 1-3 cycles
L2 hit: 10-15 cycles
L3 hit: 30-50 cycles
Average lookup: <10ns on modern hardware

→ Sustained throughput: 10-15GB/s per core

Why Not Smaller? (32-bit)

A 32-bit gear table would:

Collide after ~77,000 chunks (unacceptable for production)
Require masking operations to stay within 32 bits
Waste ALU capability (CPU still processes 64 bits)

Why Not Larger? (128-bit)

A 128-bit gear table would:

Require SIMD instructions (AVX) for efficient operation
Not fit in L1 cache (4KB vs 32KB total)
Provide diminishing returns (2⁶⁴ is already astronomical)

The Bottom Line

Important

64-bit × 256 entries = 2KB is the mathematical "sweet spot" where hardware constraints (cache lines, registers, ALU) align perfectly with statistical guarantees. Memory footprint remains trivial (0.006% of a typical 32KB L1 cache) while providing collision resistance that exceeds any realistic workload.

This is why FastCDC, and every production CDC system, uses 64-bit gear tables. It's not arbitrary—it's optimal.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.github/workflows		.github/workflows
go		go
rust		rust
tests/data		tests/data
.gitignore		.gitignore
README.md		README.md
RESEARCH.md		RESEARCH.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resplix CDC Research Cluster

/// Architecture Manifestos

The Gear Hash Advantage

FastCDC (2016) Key Innovations

/// Performance Targets

/// Cross-Platform CI Benchmarks

/// Project Structure

/// Getting Started

Rust Implementation

Go Implementation

/// Technical Deep-Dive

The 64-Bit Gear Table

Why 64 Bits?

1. Native CPU Word Size

2. Cache Line Alignment

3. Collision Resistance (The Birthday Paradox)

4. Memory Footprint vs. Coverage

5. Practical Throughput Impact

Why Not Smaller? (32-bit)

Why Not Larger? (128-bit)

The Bottom Line

© 2026 Resplix Infrastructure. Sharding the future of data mobility.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Resplix CDC Research Cluster

/// Architecture Manifestos

The Gear Hash Advantage

FastCDC (2016) Key Innovations

/// Performance Targets

/// Cross-Platform CI Benchmarks

/// Project Structure

/// Getting Started

Rust Implementation

Go Implementation

/// Technical Deep-Dive

The 64-Bit Gear Table

Why 64 Bits?

1. Native CPU Word Size

2. Cache Line Alignment

3. Collision Resistance (The Birthday Paradox)

4. Memory Footprint vs. Coverage

5. Practical Throughput Impact

Why Not Smaller? (32-bit)

Why Not Larger? (128-bit)

The Bottom Line

© 2026 Resplix Infrastructure. Sharding the future of data mobility.

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages