Perception

A high-throughput Rust binary for real-time computer vision: object detection, face recognition, and OCR. Designed for edge deployment on surveillance systems, drones, and roadside units with powerful GPU/CPU hardware.

What It Does

Object Detection — Detects and classifies objects (vehicles, people, bicycles, etc.) using YOLO26 via ONNX Runtime
Face Recognition — Detects faces (SCRFD) and matches them against a known faces database (ArcFace embeddings)
OCR — Detects and reads text in the scene (license plates, signs) using PaddleOCR
Multi-Object Tracking — ByteTrack assigns stable IDs across frames, avoiding redundant processing
High-Throughput Storage — SQLite WAL for edge-local storage (~100k events/sec), optional Postgres batch sync
Flexible Input — Single images, video files, live cameras, or RTSP streams

Quick Start

# Build (CPU-only)
cargo build --release

# Download models
./target/release/perception download

# Run on an image
./target/release/perception run -c perception.toml --source image --path photo.jpg

# Run on a video file
./target/release/perception run -c perception.toml --source video --path highway.mp4

# Run on a live camera
./target/release/perception run -c perception.toml --source camera --path 0

# Run on an RTSP stream
./target/release/perception run -c perception.toml --source rtsp --path "rtsp://192.168.1.100:554/stream"

Building with GPU Support

# NVIDIA CUDA
cargo build --release --features cuda

# NVIDIA TensorRT (maximum inference speed)
cargo build --release --features tensorrt

# Apple Silicon (CoreML)
cargo build --release --features coreml

# With live preview window
cargo build --release --features preview

# With Postgres sync support
cargo build --release --features postgres

# Kitchen sink
cargo build --release --features "cuda,preview,postgres"

Build Dependencies

Rust 1.82+
OpenCV 4.x (with videoio, imgcodecs, imgproc modules)
CUDA Toolkit 12.x (if using cuda or tensorrt features)
CMake (for opencv-rust build)

macOS

brew install opencv

Ubuntu/Debian

sudo apt install libopencv-dev

Configuration

Copy perception.example.toml to perception.toml and edit to your needs. See PLAN.md for the full configuration reference.

Key configuration sections:

Section	Purpose
`[capture]`	Input source type, path, FPS limit
`[pipeline]`	Enable/disable detection, face, OCR; confidence thresholds
`[pipeline.detection]`	Model variant, target classes
`[pipeline.face]`	Detection/recognition models, known faces directory, similarity threshold
`[pipeline.ocr]`	Model variant, languages
`[pipeline.tracker]`	ByteTrack parameters (max age, IOU threshold)
`[storage]`	Backend (sqlite/postgres), paths, crop saving
`[storage.sync]`	Optional Postgres batch sync interval
`[preview]`	Live preview window toggle
`[models]`	Model cache directory, auto-download toggle

CLI Reference

perception [OPTIONS] <COMMAND>

Commands:
  run        Run the perception pipeline
  download   Download/update models
  faces      Manage known faces database
  info       Show system info (GPU, available EPs, loaded models)

Options:
  -c, --config <PATH>    Config file path [default: perception.toml]
  -v, --verbose          Increase log verbosity (-v, -vv, -vvv)

Managing Known Faces

# Add a known face (provide name + photo with a clear face)
perception faces add "John Doe" john.jpg

# List all known faces
perception faces list

# Remove a known face
perception faces remove "John Doe"

Architecture

See PLAN.md for the full architecture, concurrency model, and implementation plan.

Pipeline overview:

Capture (OS thread) → Router (async) → Detection Workers (spawn_blocking) → Tracker → Storage (async batched)

Each pipeline stage (object detection, face recognition, OCR) runs concurrently. The tracker ensures expensive models only run on newly-appearing objects, not every frame. This is the key to handling thousands of objects at highway speeds.

Supported Models

Task	Model	Input Size	Speed
Object Detection	YOLO26n	640x640	~5ms (GPU)
Object Detection	YOLO26s	640x640	~10ms (GPU)
Face Detection	SCRFD 2.5G	640x640	~3ms (GPU)
Face Recognition	ArcFace R50	112x112	~2ms (GPU)
Text Detection	PaddleOCR DBNet	640x640	~5ms (GPU)
Text Recognition	PaddleOCR CRNN	48x320	~3ms (GPU)

Models are automatically downloaded on first run (configurable via [models] section).

Feature Flags

Flag	Default	Description
`cpu`	yes	CPU inference (always available as fallback)
`cuda`	no	NVIDIA CUDA execution provider
`tensorrt`	no	NVIDIA TensorRT execution provider
`coreml`	no	Apple CoreML execution provider
`preview`	no	OpenCV highgui live preview window
`postgres`	no	PostgreSQL sync support

Storage Schema

Events are stored in SQLite with the following schema:

CREATE TABLE events (
    id TEXT PRIMARY KEY,
    timestamp TEXT NOT NULL,
    frame_number INTEGER NOT NULL,
    track_id INTEGER,
    detection_kind TEXT NOT NULL,    -- 'object', 'face', 'text'
    label TEXT,                      -- 'car', 'person', 'John Doe', etc.
    confidence REAL NOT NULL,
    bbox_x1 REAL, bbox_y1 REAL, bbox_x2 REAL, bbox_y2 REAL,
    ocr_text TEXT,                   -- recognized text (if OCR)
    face_identity TEXT,              -- matched face name (if face)
    face_similarity REAL,
    crop_path TEXT,                  -- path to saved image crop
    synced INTEGER DEFAULT 0         -- 0=pending, 1=synced to Postgres
);

Testing

# Unit tests (no models needed)
cargo test

# Integration tests (requires downloaded models)
perception download
cargo test -- --ignored

# Benchmarks
cargo bench

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
benches		benches
docs		docs
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
perception.example.toml		perception.example.toml
perception.toml		perception.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Perception

What It Does

Quick Start

Building with GPU Support

Build Dependencies

macOS

Ubuntu/Debian

Configuration

CLI Reference

Managing Known Faces

Architecture

Supported Models

Feature Flags

Storage Schema

Testing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Perception

What It Does

Quick Start

Building with GPU Support

Build Dependencies

macOS

Ubuntu/Debian

Configuration

CLI Reference

Managing Known Faces

Architecture

Supported Models

Feature Flags

Storage Schema

Testing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages