Drop-in OpenAI-compatible gateway. Every request fans across 36 engines from 7 providers. When the primary returns 429 or 5xx, the next engine fires in under 50 milliseconds. Round-robin key rotation, six specialised modes (Smart, Reasoner, Live, Fast, Coder, Vision), an MCP-shape tool catalog, persistent user memory, FLUX image generation with key rotation, plus TTS / STT cascades. Built so an internal AI product never sees an outage the way a single-provider wrapper does.
%%{init: {'theme':'dark','themeVariables':{'primaryColor':'#0d2e4f','primaryTextColor':'#e6f5ff','lineColor':'#22d3ee','primaryBorderColor':'#22d3ee','actorBkg':'#1e3a5f','actorBorder':'#22d3ee','actorTextColor':'#ffffff'}}}%%
sequenceDiagram
autonumber
participant Client
participant Router as Intent Router
participant PA as Primary Engine
participant PB as Failover Engine
participant Mem as Memory + Tools
Client->>Router: POST /api/v1/chat
Router->>Router: classify intent (Smart / Live / Coder / ...)
Router->>PA: dispatch primary
PA-->>Router: 429 Too Many Requests
Note over Router,PB: handoff in under 50ms
Router->>PB: retry on next engine
PB->>Mem: recall facts + tools
Mem-->>PB: context window
PB-->>Router: 200 streaming
Router-->>Client: SSE first token ~120ms
5 engines GPT-OSS 120B + 20B |
4 engines DeepSeek V3.2 |
3 engines Qwen 3 235B |
4 engines 2.5 Flash + 3 |
17 engines Nemotron + GLM |
images klein 9B + 4B |
live weather + FX |
First major release. Fourteen sp_* tools replace whole-file reads with scoped symbol pulls, reproducible ~95% per-read savings via pnpm benchmark. A React + Vite + d3 dashboard with nine routed views including an interactive code dependency graph. A cross-tab agent bus that lets multiple Claude Code tabs on one project coordinate at turn boundaries. A cold-start knowledge feed on every SessionStart so no session begins blank. Dollar cost of tokens saved, downloadable session reports, a memory doctor, the insights band, the project knowledge brief, and a 75-skill methodology library.
Six editor install paths · 321 tests · MIT
Bring-your-own-subscription. Echo never asks for an API key. It dispatches each prompt to whichever subscription-backed CLI you already pay for, claude, codex or gemini, picked by a router that scores capability, quota remaining and freshness. Voice in. Voice out. Vision when it helps. Memory across years. Translucent multi-monitor HUD planned. Cross-platform from one Rust core. MIT. Local-first.
Where it is now: Foundation + the orchestration layer are in and tested, 64 tests green. The brain router across claude/codex/gemini is wired and proven against a fake CLI; the file-based memory store with PreSession digests is live; an MCP skills bus runs weather / web search / files; the voice traits are defined and the macOS TTS adapter is real.
What is still landing: real Porcupine wake word, real cpal mic capture, real whisper.cpp speech-to-text, real Piper TTS as the cross-platform default, the wired end-to-end voice loop, the setup wizard, sqlite-vss vector memory.
Then: HUD polish + multi-monitor, calendar + mail over one-click OAuth, the senses, a proactive engine, autonomous workflows, signed installers.
I am Sarma. I build open-source software from a desk in the UK.
LLM infrastructure, coding agents, inference servers, storage engines, consensus protocols, WebAssembly sandboxes, platform tools. Every project lives on GitHub with a whitepaper, an architecture diagram and a quick-start guide on sarmalinux.com/products.
What pulls me back to the desk every weekend is the same thing that pulled me into the industry: the quiet thrill of building something from scratch. A blank repository, a problem worth solving, a system that did not exist yesterday and ships today.
When I am not at the desk, I write long-form essays about what I am learning, contribute to the open-source projects I rely on, and run a small weekend charity where I build free websites for local businesses in Hemel Hempstead.
| Date | What |
|---|---|
| 8 Jun 2026 | echo Phase 0 + brain-router scaffolding in: Brain trait + Claude/Codex/Gemini subprocess wrappers, capability-and-quota router, file-based memory with PreSession digests, MCP skills bus with weather/web-search/files, voice traits + macOS TTS. 64 tests green. Real wake word, mic, whisper.cpp and Piper are next. v1.0 still aimed at 1 July 2026. |
| 6 Jun 2026 | slipstream v1.0.0: first major release. React dashboard with nine views, interactive code graph, cross-tab agent bus, cold-start knowledge feed, reproducible pnpm benchmark hitting ~95% per-read, dollar cost of tokens saved, memory doctor, 75-skill library, 321 tests. |
| 6 Jun 2026 | slipstream v0.27.0: production React dashboard (Vite + TypeScript + d3) with grouped sidebar (Now / History / Knowledge), typed JSON client and interactive knowledge graph. |
| 6 Jun 2026 | slipstream v0.24.0: reproducible token-savings benchmark. pnpm benchmark measures whole-file vs scoped reads on real files and prints a Markdown table. |
| 6 Jun 2026 | slipstream v0.8.0: dashboard insights band. Every data tab opens with a natural-language paragraph plus bullets, deterministic templates, zero LLM. |
| 4 Jun 2026 | slipstream v0.7.0: tabbed dashboard (Live, Project, Journal, Sessions, Memory) with 365-day heatmap, file leaderboard, kinds donut, distilled lessons. |
| 4 Jun 2026 | slipstream v0.6.0: cross-IDE parity (sp_digest + sp_resume + auto-mode-detect + slipstream-setup), nine backend features, redesigned glass-on-dark dashboard. |
| 3 Jun 2026 | NVIDIA Computex 2026 recap: Vera Rubin NVL72 in production, RTX Spark, Cosmos 3, Nemotron 3 Ultra. |
| 1 Jun 2026 | AI Engineer World's Fair 2026 recap: MCP took the year. Six themes that defined where AI engineering is going. |
| 31 May 2026 | echo repo opened, public launch scheduled 1 July 2026. |
| 3 May 2026 | Sarmalink-AI v2: intent auto-routing, MCP-shape tool catalog, TTS/STT cascades, image generation rotation. |
|
|
Every repo has a bespoke product trio on sarmalinux.com/products: whitepaper, architecture diagram, quick-start. All MIT.
The full eight-tier stack with every choice and why it earned a place lives at sarmalinux.com/technology. Boring tech, surgical complexity. No AWS, no Azure.
A handful of good entry points into the eighty-seven long-form engineering essays:
- NVIDIA Computex 2026, what AI engineers need to know, Vera Rubin NVL72, RTX Spark, Cosmos 3, Nemotron 3 Ultra
- AI Engineer World's Fair 2026, what mattered, six themes that defined the year
- SarmaLink-AI failover deep dive, how multi-engine fallback actually works in production
- Building Agent Orchestrator, the journaled-Postgres pattern behind deterministic replay
- Why I open-sourced 12 repos, the reasoning, the trade-offs
- Terraform Stack vs Pulumi vs SST, an honest comparison
- F1 2026 mid-season after the cancellation, because not everything is code
I am open to permanent, full-time PAYE software engineering roles across the United Kingdom. Remote, hybrid or on-site. Senior or mid-level individual contributor in AI infrastructure, AI engineering, platform engineering, backend or full-stack development. Not taking contract, consulting or agency subcontract work.
The full pitch with a capability matrix, recent ships and selected open-source work lives at sarmalinux.com/hire-me.
Built by sarmalinux · UK · All projects MIT licensed · Updated daily


