14 years building production systems — 5 of them leading a quant trading team, where correctness and latency left no room for hand-waving. I work AI Native and full-stack now, focused on a question that keeps getting more urgent: can we trust what AI agents actually do?
Most of my recent work is tooling that makes agent behavior measurable, safe, and verifiable — and I ship all of it, across Python, TypeScript, and Rust.
- stateful-guardrails — catching slow-burn, multi-turn agent threats before they escalate
- agentscore — scoring the health of AI agent dev environments
- vali — linting hallucinations and slop out of AI-generated code
| Project | What it does | Stack |
|---|---|---|
| stateful-guardrails | Catches slow-burn, multi-turn agent threats by accumulating risk across a conversation instead of judging each message alone. Validated with McNemar tests & bootstrap CIs — negative results reported honestly. | Python |
| mycelium | Local-first hybrid RAG + GraphRAG over any markdown vault. Dense + BM25 (Korean tokenizer) fused with RRF, source-cited answers, fully local on Ollama. | Python |
| agentscore | Lighthouse for AI agent dev environments — a CLI that scores the health of Claude Code MCP / plugin setups. | Python |
| vali | Linter that flags hallucinations, slop, and over-engineering in AI-generated code. | TypeScript |
| Project | What it does | Stack |
|---|---|---|
| evmscope | EVM intelligence toolkit for AI agents — 23 MCP tools across 7 chains, zero config. | TypeScript |
| chain-eye | Real-time EVM transaction monitoring TUI. | Rust |
| Project | What it does | Stack |
|---|---|---|
| drift | Dependency health monitor — scores your project's dependency survival probability. | Rust |
| git-vibe | Vibe-check your codebase — git history analysis with emoji-based reports. | Rust |
| dotfig | One config to rule them all — generate ESLint, Prettier, TypeScript & EditorConfig from a single dotfig.yml. |
TypeScript |
