A high-performance, atomic-based rate limiting ecosystem for Rust services.
This project is split into three specialized crates:
shot-limit: The core engine. It contains the logic for rate-limiting strategies (Generic Cell Rate Algorithm (GCRA), Token Bucket, Fixed Window, Sliding Window). It is zero-cost, generic, and can be used independently of any networking framework.tower-shot: The integration layer forasyncRust. It providestower::Layerandtower::Serviceimplementations that wrapshot-limitstrategies, making them easy to use withaxum,tonic, or any other Tower-compatible stack.py-shot-limit: Python bindings for the core engine, making the high-performance strategies available in Python.
Shot was created to provide a flexible rate-limiting solution for Rust applications, focusing on configurability and performance.
It offers several tower layers and helpers to suit different needs:
make_timeout_svc: Maximizes throughput by retrying requests within a timeout.make_latency_svc: Provides aggressive load shedding to protect service latency.RateLimitLayer: A drop-in replacement for the existingtowerrate-limiting layer, designed for maximum performance.
Because the underlying shot-limit strategies are lock-free, these crates are designed to scale linearly with your hardware. Whether you are running on a single-core edge function or a 128-core bare-metal server, Shot ensures that rate limiting is never the bottleneck in your stack.
The performance of the various strategies and layers is demonstrated in the included benchmarks. The results are good on my Mac M1 laptop, but I encourage anyone using this to verify the results are good in their environment.
The core shot-limit strategies demonstrate extremely low overhead, with single-threaded operations typically in the low nanoseconds. For instance:
- Fixed Window: ~2.58 ns (single-threaded)
- GCRA: ~2.32 ns (single-threaded)
These strategies scale efficiently, maintaining high performance under multi-threaded loads (e.g., Fixed Window at ~0.42 ns per operation with 8 threads). For full details, see shot-limit/README.md.
tower-shot introduces minimal overhead while providing robust rate-limiting middleware.
- Standard
tower-shotLayer: ~99.4 µs latency in saturated tests (precisely matching the 10k req/s target). - Managed
tower-shotLayer: ~120 ns latency (Load Shedding), offering a massive speedup compared totower's buffered approach (~3,000 ms P99) and outperforminggovernor(~265 ns).
When compared against governor in a fully Tower-compliant "Wait-until-Ready" configuration, tower-shot performs on par (~99.7 µs vs ~99.8 µs), while maintaining its architectural advantage over the native Tower implementation.
Under high contention (1,000 concurrent tasks), tower-shot managed layers maintain excellent performance, processing requests in approximately 172 µs, compared to ~317 µs for governor and ~13.5 ms for the native Tower implementation. For full details, see tower-shot/README.md.
I've written most of the code in this repository, but, because I was interested in seeing how useful AI would be, a substantial part of the code is written by AI (Gemini). Some of the documentation was added by AI and most of the tests were generated by AI as was most of the Python binding. I've focussed my time on the core Rate Limiting Algorithms and the implementation of the various tower services, layers and utility functions.
Licensed under either of Apache License, Version 2.0 or MIT license at your option.