Skip to content

arablex/llm-ux-patterns

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

LLM UX Patterns

Production-tested UX patterns for AI products — streaming, RAG citations, token-cost transparency, quota fallback, evals, and AI empty states. Each pattern shown with a real production screen, not a mockup.

Maintained by Aleksey Stepikin (Stepikin Studio — solo+AI design studio). Full annotated version with problem/pattern/why breakdowns: stepikin.com/llm-ux-patterns

Most AI products fail at the interface, not the model. The model works — but the screen around it hides cost, hides sources, breaks under rate limits, and leaves new users staring at a blank page. These six patterns address that.


01 — Stream the reasoning, not just the answer

AI agent live trace: streaming reasoning steps, tool calls with arguments, live token count and latency

Problem: a spinner for 20 seconds, then a wall of text. The user can't tell if the model is thinking, stuck, or about to be wrong. Trust collapses in the silence.

Pattern: stream the steps, not just tokens — the brief as understood, the plan, each tool call with its actual arguments, each result, in order. Add a live token/latency counter.

Why: watching the work makes AI feel competent instead of magic, and makes failure legible — the user sees where a run went wrong, not just a bad answer.

02 — Make every AI source auditable (RAG)

RAG knowledge base: indexed sources with chunk counts, embedding config, per-source health including a failed source

Problem: RAG systems answer confidently from documents the user can't see. A stale or silently-failed source produces wrong answers that look identical to right ones.

Pattern: make the knowledge layer a first-class screen: what's indexed, chunking, embedding/rerank models, freshness per source — and surface the failed source in red, not in a log.

Why: in domains where being wrong has a cost, the audit trail is the product. Let users disagree with the AI by giving them everything needed to check it.

03 — Put token cost where the work happens

AI billing dashboard: month-to-date inference spend, forecast, cost per run, breakdown by agent, model, and tool

Problem: AI features have real marginal cost per use, hidden until the invoice. Users can't reason about a tool whose cost is invisible.

Pattern: a readable cost surface — MTD spend, forecast vs budget, cost per run, breakdown by agent / model / tool. The expensive model and the chatty tool become obvious.

Why: cost transparency is what lets a buyer say yes. It turns "AI is unpredictably expensive" into a number they can plan around.

04 — Design for quota limits and graceful fallback

AI quota screen: provider throttling, agents auto-routing to a fallback model, three explicit remediation options

Problem: provider rate limits are not an edge case — they're Tuesday. Most products render a 429 as a generic error toast and a dead feature.

Pattern: treat the limit as a first-class state: which provider throttles, what's already happening automatically (fallback routing), and explicit user choices — raise the tier, stay on fallback, throttle non-essential work — each with its trade-off priced out.

Why: graceful degradation separates products that survive a spike from products that just break. Never make the user guess whether the AI is down or busy.

05 — Make evals a first-class surface

AI evals dashboard: weighted quality score over time, regression detection against a threshold, judges with disagreement rate

Problem: "is the AI getting better or worse?" — every team is asked, few can answer on screen. Prompt changes ship, quality silently regresses.

Pattern: a visible quality ledger: weighted score over time, pass threshold, regressions flagged when crossed, judges (model + human) with disagreement rate, each regression traced to the change that caused it.

Why: evals on screen turn AI quality from a vibe into a defensible number — for the team, the buyer, and the regulator.

06 — The empty state is the hardest AI screen

Honest AI product empty state: no fake metrics, one clear primary action, a suggested gentlest first step

Problem: a new user opens an AI product to nothing — no data, no examples, no idea what good looks like. This is where most AI tools lose people.

Pattern: honest and directive: don't fake activity, explain what the screen becomes after the first action, one unambiguous primary CTA, and suggest the gentlest first step (a starter template, not a blank canvas).

Why: the first run is the highest-stakes screen in the product. A tool that respects the user's intelligence on a quiet day earns the right to a loud one.


About the screens

All screenshots are from Atlas, an AI-agent control room (observability, traces, evals, billing, multi-tenant) designed by Aleksey Stepikin. More AI product work: Vigilo — global risk monitor, 44 live sources, 198 countries, built solo.

Contributing

Found a pattern that belongs here, or a production example that does one of these better? Open an issue or PR.

License

Text: CC BY 4.0 — cite stepikin.com/llm-ux-patterns. Screenshots: © Aleksey Stepikin, used here for documentation; ask before reuse.

About

Production-tested UX patterns for AI products: streaming, RAG citations, token-cost transparency, quota fallback, evals, empty states — with real screens

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors