Skip to content
View sarmakska's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report sarmakska

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sarmakska/README.md
separator

Sarmalink-AI, multi-provider gateway, 36 engines, 7 providers, under 50ms Sarmalink-AI live ticker

Sarmalink-AI · one endpoint, thirty-six engines, zero surprise bills

Drop-in OpenAI-compatible gateway. Every request fans across 36 engines from 7 providers. When the primary returns 429 or 5xx, the next engine fires in under 50 milliseconds. Round-robin key rotation, six specialised modes (Smart, Reasoner, Live, Fast, Coder, Vision), an MCP-shape tool catalog, persistent user memory, FLUX image generation with key rotation, plus TTS / STT cascades. Built so an internal AI product never sees an outage the way a single-provider wrapper does.


How a request flows

%%{init: {'theme':'dark','themeVariables':{'primaryColor':'#0d2e4f','primaryTextColor':'#e6f5ff','lineColor':'#22d3ee','primaryBorderColor':'#22d3ee','actorBkg':'#1e3a5f','actorBorder':'#22d3ee','actorTextColor':'#ffffff'}}}%%
sequenceDiagram
    autonumber
    participant Client
    participant Router as Intent Router
    participant PA as Primary Engine
    participant PB as Failover Engine
    participant Mem as Memory + Tools
    Client->>Router: POST /api/v1/chat
    Router->>Router: classify intent (Smart / Live / Coder / ...)
    Router->>PA: dispatch primary
    PA-->>Router: 429 Too Many Requests
    Note over Router,PB: handoff in under 50ms
    Router->>PB: retry on next engine
    PB->>Mem: recall facts + tools
    Mem-->>PB: context window
    PB-->>Router: 200 streaming
    Router-->>Client: SSE first token ~120ms
Loading

Seven providers, thirty-six engines, six modes

Groq
5 engines
GPT-OSS 120B + 20B
SambaNova
4 engines
DeepSeek V3.2
Cerebras
3 engines
Qwen 3 235B
Gemini
4 engines
2.5 Flash + 3
OpenRouter
17 engines
Nemotron + GLM
Cloudflare
images
klein 9B + 4B
Tavily
live
weather + FX

36 engines 7 providers <50ms failover MIT wiki



Read the long plan Clone and deploy


separator

slipstream v1.0, cross-IDE MCP toolkit, 95% per-read savings slipstream live ticker
$  cp | ctx 12%* ok | mem 4 | obs 37 | opt 71% | skill scoped-read    (~12 steps)

slipstream · Claude Code plugin + cross-IDE MCP toolkit

First major release. Fourteen sp_* tools replace whole-file reads with scoped symbol pulls, reproducible ~95% per-read savings via pnpm benchmark. A React + Vite + d3 dashboard with nine routed views including an interactive code dependency graph. A cross-tab agent bus that lets multiple Claude Code tabs on one project coordinate at turn boundaries. A cold-start knowledge feed on every SessionStart so no session begins blank. Dollar cost of tokens saved, downloadable session reports, a memory doctor, the insights band, the project knowledge brief, and a 75-skill methodology library.

Six editor install paths · 321 tests · MIT


v1.0.0 321 tests 14 sp tools 75 skills MIT changelog wiki



Read the long plan Install in six editors


separator

echo, open Jarvis, brain-agnostic, one Rust core, 1 July 2026 echo live ticker

echo · the open Jarvis you actually own

Bring-your-own-subscription. Echo never asks for an API key. It dispatches each prompt to whichever subscription-backed CLI you already pay for, claude, codex or gemini, picked by a router that scores capability, quota remaining and freshness. Voice in. Voice out. Vision when it helps. Memory across years. Translucent multi-monitor HUD planned. Cross-platform from one Rust core. MIT. Local-first.

Where it is now: Foundation + the orchestration layer are in and tested, 64 tests green. The brain router across claude/codex/gemini is wired and proven against a fake CLI; the file-based memory store with PreSession digests is live; an MCP skills bus runs weather / web search / files; the voice traits are defined and the macOS TTS adapter is real.

What is still landing: real Porcupine wake word, real cpal mic capture, real whisper.cpp speech-to-text, real Piper TTS as the cross-platform default, the wired end-to-end voice loop, the setup wizard, sqlite-vss vector memory.

Then: HUD polish + multi-monitor, calendar + mail over one-click OAuth, the senses, a proactive engine, autonomous workflows, signed installers.


coming 1 July 2026 Phase 0 + 1 foundation in, 64 tests phases landing daily Long plan


separator

About me

I am Sarma. I build open-source software from a desk in the UK.

LLM infrastructure, coding agents, inference servers, storage engines, consensus protocols, WebAssembly sandboxes, platform tools. Every project lives on GitHub with a whitepaper, an architecture diagram and a quick-start guide on sarmalinux.com/products.

What pulls me back to the desk every weekend is the same thing that pulled me into the industry: the quiet thrill of building something from scratch. A blank repository, a problem worth solving, a system that did not exist yesterday and ships today.

When I am not at the desk, I write long-form essays about what I am learning, contribute to the open-source projects I rely on, and run a small weekend charity where I build free websites for local businesses in Hemel Hempstead.


separator

Recent ships

Date What
8 Jun 2026 echo Phase 0 + brain-router scaffolding in: Brain trait + Claude/Codex/Gemini subprocess wrappers, capability-and-quota router, file-based memory with PreSession digests, MCP skills bus with weather/web-search/files, voice traits + macOS TTS. 64 tests green. Real wake word, mic, whisper.cpp and Piper are next. v1.0 still aimed at 1 July 2026.
6 Jun 2026 slipstream v1.0.0: first major release. React dashboard with nine views, interactive code graph, cross-tab agent bus, cold-start knowledge feed, reproducible pnpm benchmark hitting ~95% per-read, dollar cost of tokens saved, memory doctor, 75-skill library, 321 tests.
6 Jun 2026 slipstream v0.27.0: production React dashboard (Vite + TypeScript + d3) with grouped sidebar (Now / History / Knowledge), typed JSON client and interactive knowledge graph.
6 Jun 2026 slipstream v0.24.0: reproducible token-savings benchmark. pnpm benchmark measures whole-file vs scoped reads on real files and prints a Markdown table.
6 Jun 2026 slipstream v0.8.0: dashboard insights band. Every data tab opens with a natural-language paragraph plus bullets, deterministic templates, zero LLM.
4 Jun 2026 slipstream v0.7.0: tabbed dashboard (Live, Project, Journal, Sessions, Memory) with 365-day heatmap, file leaderboard, kinds donut, distilled lessons.
4 Jun 2026 slipstream v0.6.0: cross-IDE parity (sp_digest + sp_resume + auto-mode-detect + slipstream-setup), nine backend features, redesigned glass-on-dark dashboard.
3 Jun 2026 NVIDIA Computex 2026 recap: Vera Rubin NVL72 in production, RTX Spark, Cosmos 3, Nemotron 3 Ultra.
1 Jun 2026 AI Engineer World's Fair 2026 recap: MCP took the year. Six themes that defined where AI engineering is going.
31 May 2026 echo repo opened, public launch scheduled 1 July 2026.
3 May 2026 Sarmalink-AI v2: intent auto-routing, MCP-shape tool catalog, TTS/STT cascades, image generation rotation.

separator

The portfolio · nineteen MIT-licensed projects

Flagships

  • Sarmalink-ai · Multi-provider OpenAI-compatible AI gateway with 36-engine failover across 7 providers, intent-based plugin auto-routing, MCP-shape tool catalog and Manus webhook persistence.
  • slipstream · v1.0 shipped. Claude Code plugin and cross-IDE MCP toolkit. Fourteen sp_* tools, self-building memory, lossless compaction, React dashboard with nine views and an interactive code dependency graph, cross-tab agent bus, cold-start knowledge feed, 75-skill methodology library. 321 tests, MIT.

Coming next

  • echo · An open Jarvis. Brain-agnostic across Claude Code, Codex CLI, Gemini CLI, Ollama and LM Studio. Translucent multi-monitor HUD planned. Phase 0 + Phase 1 orchestration scaffolding in, 64 tests; real audio I/O and the setup wizard ship next. Public v1.0 on 1 July 2026.

AI infrastructure

  • agent-orchestrator · Durable multi-agent workflows in TypeScript, deterministic replay, journaled Postgres state, BullMQ step queue, Inspector UI.
  • voice-agent-starter · Sub-second full-duplex WebRTC voice loop, mediasoup SFU, Fastify model worker, pluggable STT, LLM, TTS adapters.
  • ai-eval-runner · Evals as code. Python 3.12, Typer CLI, DuckDB store, FastAPI + HTMX viewer.
  • forge-infer · Minimal LLM inference server in Rust with paged KV-cache, continuous batching and speculative decoding.

MCP and AI applications

  • mcp-server-toolkit · Production Model Context Protocol server starter in Python and FastAPI.
  • local-llm-router · OpenAI-compatible proxy routing between Ollama and cloud LLMs by policy.
  • rag-over-pdf · A minimal, production-shaped RAG starter with cited streaming answers.
  • receipt-scanner · Vision OCR receipts to Zod-validated JSON.

Systems software

  • lsmdb · Log-structured merge-tree storage engine in Go. WAL, SSTables, bloom filters, MVCC snapshots.
  • raftkv · Raft KV store in Go with a fault-injection harness proving linearizability under partitions.
  • sandboxd · WebAssembly sandbox in Rust with a deny-by-default host ABI and strict CPU, wall-clock and memory bounds.

Platform engineering

  • terraform-stack · Vercel, Supabase, Cloudflare and DigitalOcean modules in one Terraform repo.
  • k8s-ops-toolkit · Helm chart for shipping Next.js to Kubernetes with full observability pre-wired.
  • shipyard · Multi-tenant SaaS scaffold in TypeScript. Tenant isolation, RBAC, billing, audit log, rate limits.

Tools

  • webhook-to-email · Webhook receiver that forwards events to email via Resend.
  • staff-portal · Open-source HR and ops portal. Leave, attendance, expenses, kiosk mode.

Every repo has a bespoke product trio on sarmalinux.com/products: whitepaper, architecture diagram, quick-start. All MIT.


separator

Stack

Languages and frameworks
Infrastructure and tooling



AI infrastructure MCP Agent orchestration Voice RAG Inference Storage Consensus Sandboxing Evals Platform

The full eight-tier stack with every choice and why it earned a place lives at sarmalinux.com/technology. Boring tech, surgical complexity. No AWS, no Azure.


separator

Stats

Streak



19 OSS projects 87 essays Stars Followers
Activity graph

separator

Writing

A handful of good entry points into the eighty-seven long-form engineering essays:


separator

Hiring

I am open to permanent, full-time PAYE software engineering roles across the United Kingdom. Remote, hybrid or on-site. Senior or mid-level individual contributor in AI infrastructure, AI engineering, platform engineering, backend or full-stack development. Not taking contract, consulting or agency subcontract work.

The full pitch with a capability matrix, recent ships and selected open-source work lives at sarmalinux.com/hire-me.


Read the full pitch Email


footer

Built by sarmalinux · UK · All projects MIT licensed · Updated daily

Pinned Loading

  1. Sarmalink-ai Sarmalink-ai Public

    Open-source multi-provider AI backend with automatic 14-engine failover. 36 engines across 7 providers. OpenAI-compatible proxy. Zero-cost frontier AI. Built by Sarma Linux.

    TypeScript 7 3

  2. slipstream slipstream Public

    slipstream by sarmalinux: a Claude Code plugin for token-efficient retrieval, persistent memory, lossless compaction and a live agent dashboard

    TypeScript 1 1