A Zig implementation of the One Billion Row Challenge - processing 1 billion temperature measurements (13GB file) as fast as possible. This implementation uses multi-threading, memory-mapped files, SIMD vector operations, and cache-friendly data structures to achieve high performance parsing and aggregation.
The challenge: Read a text file with 1 billion rows of city name and temperature pairs, calculate min/mean/max temperature per city, and output results sorted alphabetically.
All results are run on my local Apple M3 Pro (12 cores) with 32gb RAM
- Baseline implementation:
02:43.260 - Winner thomaswue implementation:
00:01.212
- Working single threaded implementation:
00:21.981- Optimized buffer parsing
- Stack allocated slice with forward-probing as hash map for storage
- Minimized branching
- Zero heap allocations
- Switch to custom integer parsing:
00:20.654 - Implement multi-threading:
00:05.136 - Use memory-mapped file:
00:02.489 - Remove several memory copying scenarios:
00:02.232 - Switch from mod to bitwise and with power of 2 hash-table:
00:02.184 - Merge Slot and Stat structs to avoid cache misses:
00:02.171 - Use SIMD for semicolon and new-line lookup:
00:01.572
zig build -Doptimize=ReleaseFast
time zig-out/bin/1brc measurements.txt