LLM UX Patterns

Production-tested UX patterns for AI products — streaming, RAG citations, token-cost transparency, quota fallback, evals, and AI empty states. Each pattern shown with a real production screen, not a mockup.

Maintained by Aleksey Stepikin (Stepikin Studio — solo+AI design studio). Full annotated version with problem/pattern/why breakdowns: stepikin.com/llm-ux-patterns

Most AI products fail at the interface, not the model. The model works — but the screen around it hides cost, hides sources, breaks under rate limits, and leaves new users staring at a blank page. These six patterns address that.

01 — Stream the reasoning, not just the answer

Problem: a spinner for 20 seconds, then a wall of text. The user can't tell if the model is thinking, stuck, or about to be wrong. Trust collapses in the silence.

Pattern: stream the steps, not just tokens — the brief as understood, the plan, each tool call with its actual arguments, each result, in order. Add a live token/latency counter.

Why: watching the work makes AI feel competent instead of magic, and makes failure legible — the user sees where a run went wrong, not just a bad answer.

02 — Make every AI source auditable (RAG)

Problem: RAG systems answer confidently from documents the user can't see. A stale or silently-failed source produces wrong answers that look identical to right ones.

Pattern: make the knowledge layer a first-class screen: what's indexed, chunking, embedding/rerank models, freshness per source — and surface the failed source in red, not in a log.

Why: in domains where being wrong has a cost, the audit trail is the product. Let users disagree with the AI by giving them everything needed to check it.

03 — Put token cost where the work happens

Problem: AI features have real marginal cost per use, hidden until the invoice. Users can't reason about a tool whose cost is invisible.

Pattern: a readable cost surface — MTD spend, forecast vs budget, cost per run, breakdown by agent / model / tool. The expensive model and the chatty tool become obvious.

Why: cost transparency is what lets a buyer say yes. It turns "AI is unpredictably expensive" into a number they can plan around.

04 — Design for quota limits and graceful fallback

Problem: provider rate limits are not an edge case — they're Tuesday. Most products render a 429 as a generic error toast and a dead feature.

Pattern: treat the limit as a first-class state: which provider throttles, what's already happening automatically (fallback routing), and explicit user choices — raise the tier, stay on fallback, throttle non-essential work — each with its trade-off priced out.

Why: graceful degradation separates products that survive a spike from products that just break. Never make the user guess whether the AI is down or busy.

05 — Make evals a first-class surface

Problem: "is the AI getting better or worse?" — every team is asked, few can answer on screen. Prompt changes ship, quality silently regresses.

Pattern: a visible quality ledger: weighted score over time, pass threshold, regressions flagged when crossed, judges (model + human) with disagreement rate, each regression traced to the change that caused it.

Why: evals on screen turn AI quality from a vibe into a defensible number — for the team, the buyer, and the regulator.

06 — The empty state is the hardest AI screen

Problem: a new user opens an AI product to nothing — no data, no examples, no idea what good looks like. This is where most AI tools lose people.

Pattern: honest and directive: don't fake activity, explain what the screen becomes after the first action, one unambiguous primary CTA, and suggest the gentlest first step (a starter template, not a blank canvas).

Why: the first run is the highest-stakes screen in the product. A tool that respects the user's intelligence on a quiet day earns the right to a loud one.

About the screens

All screenshots are from Atlas, an AI-agent control room (observability, traces, evals, billing, multi-tenant) designed by Aleksey Stepikin. More AI product work: Vigilo — global risk monitor, 44 live sources, 198 countries, built solo.

Contributing

Found a pattern that belongs here, or a production example that does one of these better? Open an issue or PR.

License

Text: CC BY 4.0 — cite stepikin.com/llm-ux-patterns. Screenshots: © Aleksey Stepikin, used here for documentation; ask before reuse.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
screens		screens
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM UX Patterns

01 — Stream the reasoning, not just the answer

02 — Make every AI source auditable (RAG)

03 — Put token cost where the work happens

04 — Design for quota limits and graceful fallback

05 — Make evals a first-class surface

06 — The empty state is the hardest AI screen

About the screens

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

LLM UX Patterns

01 — Stream the reasoning, not just the answer

02 — Make every AI source auditable (RAG)

03 — Put token cost where the work happens

04 — Design for quota limits and graceful fallback

05 — Make evals a first-class surface

06 — The empty state is the hardest AI screen

About the screens

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages