Venkat Raman Venkat2811

Mechanical Sympathy is All You Need

Hi 👋, I'm Venkat !

Built and scaled systems that can handle 5k->250k RPS w/o breaking a sweat.

Got into model serving and inference, enjoyed solving cold start, intelligent routing and optimizing GPU cluster utilization. Did a bit of RAG & Agents infra. Currently ML Infra - training, inference, comms collectives, storage, compiler backends, custom kernels optimizations & researching novel techniques.

High Agency individual deep in agentic-engineering mode. AI tools have enabled me to touch end-to-end infra from user facing APIs & Infra to tensors to metal. Always looking to maximize my learning curve 📈

ʕ•ᴥ•ʔ venkat.systems

Highlights

Projects

🐨 WombatKV - KV blocks survive restarts, save prefill flops - Object-storage-native KV cache for Inference.
⚡ myelon - HFT-grade LMAX-Disruptor multiprocess IPC over SHM & mmap. 240 ns P99 · 5.58 M ops/s · 92.6 GB/s.
🐘 YALI - Ultra-low-latency GPU comms collective. Outperforms NVIDIA NCCL P2P by 1.2 - 2.4x.
🪢 GPU Kernel Batcher - Batching identical GEMMs into one cuBLAS call - 90%+ fewer launches, 22% faster FP16 workloads
⏲️ Metered Compute - 5 reference architectures for reliably metering sync and async compute.
🔍 Inference Assayer - Compiler driven models <> HWs inference perf analyzing deterministic fast simulator lab.