Sarma Linux sarmakska

Sarmalink-AI · one endpoint, thirty-six engines, zero surprise bills

Drop-in OpenAI-compatible gateway. Every request fans across 36 engines from 7 providers. When the primary returns 429 or 5xx, the next engine fires in under 50 milliseconds. Round-robin key rotation, six specialised modes (Smart, Reasoner, Live, Fast, Coder, Vision), an MCP-shape tool catalog, persistent user memory, FLUX image generation with key rotation, plus TTS / STT cascades. Built so an internal AI product never sees an outage the way a single-provider wrapper does.

How a request flows

%%{init: {'theme':'dark','themeVariables':{'primaryColor':'#0d2e4f','primaryTextColor':'#e6f5ff','lineColor':'#22d3ee','primaryBorderColor':'#22d3ee','actorBkg':'#1e3a5f','actorBorder':'#22d3ee','actorTextColor':'#ffffff'}}}%%
sequenceDiagram
    autonumber
    participant Client
    participant Router as Intent Router
    participant PA as Primary Engine
    participant PB as Failover Engine
    participant Mem as Memory + Tools
    Client->>Router: POST /api/v1/chat
    Router->>Router: classify intent (Smart / Live / Coder / ...)
    Router->>PA: dispatch primary
    PA-->>Router: 429 Too Many Requests
    Note over Router,PB: handoff in under 50ms
    Router->>PB: retry on next engine
    PB->>Mem: recall facts + tools
    Mem-->>PB: context window
    PB-->>Router: 200 streaming
    Router-->>Client: SSE first token ~120ms

Seven providers, thirty-six engines, six modes

_{5 engines
GPT-OSS 120B + 20B}

_{4 engines
DeepSeek V3.2}

_{3 engines
Qwen 3 235B}

_{4 engines
2.5 Flash + 3}

_{17 engines
Nemotron + GLM}

_{images
klein 9B + 4B}

_{live
weather + FX}

$  cp | ctx 12%* ok | mem 4 | obs 37 | opt 71% | skill scoped-read    (~12 steps)

slipstream · Claude Code plugin + cross-IDE MCP toolkit

First major release. Fourteen sp_* tools replace whole-file reads with scoped symbol pulls, reproducible ~95% per-read savings via pnpm benchmark. A React + Vite + d3 dashboard with nine routed views including an interactive code dependency graph. A cross-tab agent bus that lets multiple Claude Code tabs on one project coordinate at turn boundaries. A cold-start knowledge feed on every SessionStart so no session begins blank. Dollar cost of tokens saved, downloadable session reports, a memory doctor, the insights band, the project knowledge brief, and a 75-skill methodology library.

Six editor install paths · 321 tests · MIT

echo · the open Jarvis you actually own

Bring-your-own-subscription. Echo never asks for an API key. It dispatches each prompt to whichever subscription-backed CLI you already pay for, claude, codex or gemini, picked by a router that scores capability, quota remaining and freshness. Voice in. Voice out. Vision when it helps. Memory across years. Translucent multi-monitor HUD planned. Cross-platform from one Rust core. MIT. Local-first.

Where it is now: Foundation + the orchestration layer are in and tested, 64 tests green. The brain router across claude/codex/gemini is wired and proven against a fake CLI; the file-based memory store with PreSession digests is live; an MCP skills bus runs weather / web search / files; the voice traits are defined and the macOS TTS adapter is real.

What is still landing: real Porcupine wake word, real cpal mic capture, real whisper.cpp speech-to-text, real Piper TTS as the cross-platform default, the wired end-to-end voice loop, the setup wizard, sqlite-vss vector memory.

Then: HUD polish + multi-monitor, calendar + mail over one-click OAuth, the senses, a proactive engine, autonomous workflows, signed installers.

About me

I am Sarma. I build open-source software from a desk in the UK.

LLM infrastructure, coding agents, inference servers, storage engines, consensus protocols, WebAssembly sandboxes, platform tools. Every project lives on GitHub with a whitepaper, an architecture diagram and a quick-start guide on sarmalinux.com/products.

What pulls me back to the desk every weekend is the same thing that pulled me into the industry: the quiet thrill of building something from scratch. A blank repository, a problem worth solving, a system that did not exist yesterday and ships today.

When I am not at the desk, I write long-form essays about what I am learning, contribute to the open-source projects I rely on, and run a small weekend charity where I build free websites for local businesses in Hemel Hempstead.

Recent ships

Date	What
8 Jun 2026	echo Phase 0 + brain-router scaffolding in: `Brain` trait + Claude/Codex/Gemini subprocess wrappers, capability-and-quota router, file-based memory with PreSession digests, MCP skills bus with weather/web-search/files, voice traits + macOS TTS. 64 tests green. Real wake word, mic, whisper.cpp and Piper are next. v1.0 still aimed at 1 July 2026.
6 Jun 2026	slipstream v1.0.0: first major release. React dashboard with nine views, interactive code graph, cross-tab agent bus, cold-start knowledge feed, reproducible `pnpm benchmark` hitting ~95% per-read, dollar cost of tokens saved, memory doctor, 75-skill library, 321 tests.
6 Jun 2026	slipstream v0.27.0: production React dashboard (Vite + TypeScript + d3) with grouped sidebar (Now / History / Knowledge), typed JSON client and interactive knowledge graph.
6 Jun 2026	slipstream v0.24.0: reproducible token-savings benchmark. `pnpm benchmark` measures whole-file vs scoped reads on real files and prints a Markdown table.
6 Jun 2026	slipstream v0.8.0: dashboard insights band. Every data tab opens with a natural-language paragraph plus bullets, deterministic templates, zero LLM.
4 Jun 2026	slipstream v0.7.0: tabbed dashboard (Live, Project, Journal, Sessions, Memory) with 365-day heatmap, file leaderboard, kinds donut, distilled lessons.
4 Jun 2026	slipstream v0.6.0: cross-IDE parity (`sp_digest` + `sp_resume` + auto-mode-detect + `slipstream-setup`), nine backend features, redesigned glass-on-dark dashboard.
3 Jun 2026	NVIDIA Computex 2026 recap: Vera Rubin NVL72 in production, RTX Spark, Cosmos 3, Nemotron 3 Ultra.
1 Jun 2026	AI Engineer World's Fair 2026 recap: MCP took the year. Six themes that defined where AI engineering is going.
31 May 2026	echo repo opened, public launch scheduled 1 July 2026.
3 May 2026	Sarmalink-AI v2: intent auto-routing, MCP-shape tool catalog, TTS/STT cascades, image generation rotation.

The portfolio · nineteen MIT-licensed projects

Flagships

Sarmalink-ai · Multi-provider OpenAI-compatible AI gateway with 36-engine failover across 7 providers, intent-based plugin auto-routing, MCP-shape tool catalog and Manus webhook persistence.
slipstream · v1.0 shipped. Claude Code plugin and cross-IDE MCP toolkit. Fourteen sp_* tools, self-building memory, lossless compaction, React dashboard with nine views and an interactive code dependency graph, cross-tab agent bus, cold-start knowledge feed, 75-skill methodology library. 321 tests, MIT.

Coming next

echo · An open Jarvis. Brain-agnostic across Claude Code, Codex CLI, Gemini CLI, Ollama and LM Studio. Translucent multi-monitor HUD planned. Phase 0 + Phase 1 orchestration scaffolding in, 64 tests; real audio I/O and the setup wizard ship next. Public v1.0 on 1 July 2026.

AI infrastructure

agent-orchestrator · Durable multi-agent workflows in TypeScript, deterministic replay, journaled Postgres state, BullMQ step queue, Inspector UI.
voice-agent-starter · Sub-second full-duplex WebRTC voice loop, mediasoup SFU, Fastify model worker, pluggable STT, LLM, TTS adapters.
ai-eval-runner · Evals as code. Python 3.12, Typer CLI, DuckDB store, FastAPI + HTMX viewer.
forge-infer · Minimal LLM inference server in Rust with paged KV-cache, continuous batching and speculative decoding.

MCP and AI applications

mcp-server-toolkit · Production Model Context Protocol server starter in Python and FastAPI.
local-llm-router · OpenAI-compatible proxy routing between Ollama and cloud LLMs by policy.
rag-over-pdf · A minimal, production-shaped RAG starter with cited streaming answers.
receipt-scanner · Vision OCR receipts to Zod-validated JSON.

Systems software

lsmdb · Log-structured merge-tree storage engine in Go. WAL, SSTables, bloom filters, MVCC snapshots.
raftkv · Raft KV store in Go with a fault-injection harness proving linearizability under partitions.
sandboxd · WebAssembly sandbox in Rust with a deny-by-default host ABI and strict CPU, wall-clock and memory bounds.

Platform engineering

terraform-stack · Vercel, Supabase, Cloudflare and DigitalOcean modules in one Terraform repo.
k8s-ops-toolkit · Helm chart for shipping Next.js to Kubernetes with full observability pre-wired.
shipyard · Multi-tenant SaaS scaffold in TypeScript. Tenant isolation, RBAC, billing, audit log, rate limits.

Tools

webhook-to-email · Webhook receiver that forwards events to email via Resend.
staff-portal · Open-source HR and ops portal. Leave, attendance, expenses, kiosk mode.

Every repo has a bespoke product trio on sarmalinux.com/products: whitepaper, architecture diagram, quick-start. All MIT.

Stack

The full eight-tier stack with every choice and why it earned a place lives at sarmalinux.com/technology. Boring tech, surgical complexity. No AWS, no Azure.

Stats

Writing

A handful of good entry points into the eighty-seven long-form engineering essays:

NVIDIA Computex 2026, what AI engineers need to know, Vera Rubin NVL72, RTX Spark, Cosmos 3, Nemotron 3 Ultra
AI Engineer World's Fair 2026, what mattered, six themes that defined the year
SarmaLink-AI failover deep dive, how multi-engine fallback actually works in production
Building Agent Orchestrator, the journaled-Postgres pattern behind deterministic replay
Why I open-sourced 12 repos, the reasoning, the trade-offs
Terraform Stack vs Pulumi vs SST, an honest comparison
F1 2026 mid-season after the cancellation, because not everything is code

Hiring

I am open to permanent, full-time PAYE software engineering roles across the United Kingdom. Remote, hybrid or on-site. Senior or mid-level individual contributor in AI infrastructure, AI engineering, platform engineering, backend or full-stack development. Not taking contract, consulting or agency subcontract work.

The full pitch with a capability matrix, recent ships and selected open-source work lives at sarmalinux.com/hire-me.

_{Built by sarmalinux · UK · All projects MIT licensed · Updated daily}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly