Skip to content

karankashyap/resolver

Repository files navigation

Resolver

Event-driven AI customer-support copilot. Per message: triage → RAG retrieve → grounded draft → policy/QA guard → escalate-or-suggest. Runs 100% locally at $0.

CI License Local


What it does

A customer message arrives → the Go API persists it and publishes a message.created event → a Python LangGraph worker consumes it, classifies intent, retrieves grounded knowledge (pgvector), drafts a cited reply, runs a safety/QA guard, and either suggests a draft to a human agent or escalates when confidence is low. Results stream back to a Next.js console via GraphQL subscriptions.

Grounding is mandatory, humans stay in the loop, and an eval harness gates quality in CI so nothing untrustworthy reaches a customer.

Screenshots

Resolver - Dashboard
Resolver - Queue
Queue
Resolver - Query
Conversation
Resolver - Conversation
Draft panel
Resolver - Approval
Human approval
Resolver - Escalated
Escalation
Resolver - Sent
Sent

Architecture

flowchart LR
    UI["Next.js console"]
    R["Go API · gqlgen<br/>(thin resolvers)"]
    DB[("Postgres<br/>+ pgvector")]
    BUS{{"Redis Streams<br/>(event bus)"}}
    LLM[("LLM<br/>Ollama / OpenAI")]
    TOOLS["MCP tools<br/>(read-only)"]

    subgraph WK["Python LangGraph worker"]
        direction LR
        T["triage"] --> RT["retrieve"] --> D["draft"] --> G["guard"] --> DEC{"decision"}
        DEC -->|repair ×1| D
    end

    UI -- "mutation" --> R
    R -- "persist" --> DB
    R == "message.created" ==> BUS
    BUS == "consume" ==> T
    WK == "draft.ready / escalated" ==> BUS
    BUS -- "bridge" --> R
    R -. "subscription (live)" .-> UI
    RT -. "vector + FTS" .-> DB
    D -. "grounded gen" .-> LLM
    D -. "order / policy" .-> TOOLS
Loading
  • Go API (services/api) — thin: validate, persist, publish. No LLM/agent logic.
  • Python worker (workers/agent) — LangGraph state machine; nodes are pure-ish and schema-validated.
  • Contract — API ↔ worker talk only via the typed event schema (packages/events/events.schema.json) on Redis Streams. The request's trace id rides on the event, so one OpenTelemetry trace spans API → worker → LLM.

Tech stack

Go + gqlgen · Python + LangGraph · Postgres + pgvector · Redis Streams · MCP-style tools · Next.js + Tailwind + shadcn/ui · Ollama (default) / OpenAI-compatible · OpenTelemetry · GitHub Actions + Docker.

Quickstart

Prerequisites: Docker. (Local dev also: Go 1.25+, Python 3.12+, Node 22+.)

cd resolver_code
cp .env.example .env   # defaults run fully local / $0 — no keys required

make up                # build + boot full stack (postgres+pgvector, redis, ollama, migrate, api, worker, web)
make models            # pull qwen2.5:3b, qwen2.5:7b, nomic-embed-text (first run only)
make ingest            # Bitext -> KB + embeddings + held-out golden set
make eval              # run the eval harness -> report + eval_runs row

On a CPU-only box, drafting with the local 3b/7b models is slow (minutes per draft). Point LLM_PROVIDER=openai at a hosted/compatible endpoint for fast responses — see Models & providers.

The gqlgen-generated Go files are not committed; the Docker build and make gqlgen regenerate them from packages/graphql/schema.graphql.

Repo layout

resolver_code/
├── apps/web              Next.js agent console (streaming)        [Phase 4]
├── services/api          Go + gqlgen GraphQL API                 [Phase 1]
├── workers/agent         Python LangGraph graph, rag/, tools/     [Phase 3]
│   ├── graph/nodes       triage · retrieve · draft · guard · decision · repair
│   ├── rag/              embeddings, hybrid (vector + FTS) search, RRF + re-rank
│   ├── tools/            read-only MCP tools (audited, allow-listed)
│   └── llm/              provider adapters (ollama / OpenAI-compatible)
├── pipeline/             ingest_bitext.py + eval/                 [Phase 2/5]
├── packages/graphql      shared schema + codegen TS types
├── packages/events       events.schema.json (event contract)
├── db/migrations         versioned SQL migrations
├── deploy/               docker-compose.yml
└── data/                 golden.jsonl, samples (large files gitignored)

Models & providers

The LLM provider is an env switch behind one interface — no code change to swap.

  • Default (local, $0): Ollama. Model tiering reflects cost/quality: qwen2.5:3b for triage/classification, qwen2.5:7b for drafting and the eval judge, nomic-embed-text (768-dim) for embeddings. Pull them with make models.
  • Hosted: set LLM_PROVIDER=openai and OPENAI_API_KEY (optionally OPENAI_BASE_URL for any OpenAI-compatible endpoint). The worker's chat + embeddings switch with no code change; a missing key fails loudly at startup.

Tradeoffs: local 3b/7b on CPU is slow (minutes per draft) but free and private; grounding/guard are deterministic so safety holds regardless of model strength. A hosted model raises answer quality and speed at a per-token cost (tracked per draft as cost_cents). Generation length is bounded by DRAFT_NUM_PREDICT to cap latency/cost.

Observability

OpenTelemetry traces span the whole path: the API starts a trace per request and stamps its trace id into the message.created event, so the worker continues the same trace across the bus (API → worker → graph/LLM). Per-draft tokens, cost, and latency are recorded on the draft and as span attributes; structured JSON logs carry conversation/trace ids (no secrets/PII at info).

Exporter is env-controlled (OTEL_TRACES_EXPORTER): console (default — spans in logs, $0, no extra service), otlp (ships to OTEL_EXPORTER_OTLP_ENDPOINT), or none. For a trace UI: docker compose --profile observability up jaeger, set OTEL_TRACES_EXPORTER=otlp, and open Jaeger at localhost:16686.

Why it's built this way

The design choices, and what they demonstrate:

  • "LLM proposes, evals + guards dispose." Every generated answer must cite retrieved KB sources; a deterministic guard (grounding + tone + a forbidden-action allow-list) and a confidence threshold decide suggest vs escalate. An eval harness gates groundedness/routing/safety in CI. Quality is enforced by code, not vibes.
  • The LangGraph state machine is the source of truth for control flow (triage → retrieve → draft → guard → decision → {finalize | repair | escalate}). Nodes are pure-ish and schema-validated, so each is unit-testable and the whole graph is inspectable.
  • Event-driven Go ↔ Python contract. The thin Go API never calls an LLM; it validates, persists, and publishes a typed event. All AI work lives in the Python worker. They communicate only through the versioned event schema on Redis Streams — independently deployable, independently scalable.
  • Human-in-the-loop safety by construction. Nothing auto-sends below the confidence threshold; irreversible actions (refunds, cancellations) are never executed — only proposed as a human task. Tools are read-only, allow-listed, and audited.
  • Cost/model tiering and local-first. Small model for triage, stronger for drafting; embeddings cached; per-draft tokens/cost recorded. Runs 100% locally at $0 on Ollama, or switches to a hosted provider with one env var.

Status

Built phase-by-phase:

  • Phase 0 — foundation & local infra: monorepo skeleton, docker-compose stack, DB migrations (pgvector + HNSW), typed event contract. ✅
  • Phase 1 — Go GraphQL API: schema-first gqlgen API (thin resolvers → service → pgx store), Redis Streams pubsub bridge, ingestMessage persists + publishes message.created, draft subscription wiring, graceful shutdown, containerized via docker compose up. ✅
  • Phase 2 — Dataset → KB & RAG ingestion: make ingest loads Bitext, holds out a stratified golden set (data/golden.jsonl), builds deduped KB docs, embeds them with provider-agnostic embeddings (Ollama nomic-embed-text, 768-dim), and upserts to pgvector with an HNSW index. ✅
  • Phase 3 — LangGraph worker: consumes message.created (Redis consumer group, idempotent by event id, retries + dead-letter), runs the agent graph triage → retrieve → draft → guard → decision → {finalize \| repair \| escalate} with schema-validated node outputs, persists a grounded SUGGESTED draft (or ESCALATED) with citations + guard report + token cost, and publishes draft.ready / draft.escalated. Forbidden actions are blocked deterministically — never finalized. ✅
  • Phase 4 — Web agent console: Next.js (App Router) + Tailwind + shadcn/ui console with a typed urql GraphQL client (codegen off the shared schema). Queue with status filter and pagination, conversation view (message thread + full draft panel: confidence meter, grounding sources, guard report), live draftUpdates streaming over graphql-ws, and human-in-the-loop actions (approve/edit → SENT, reject, escalate). Verified live end-to-end: ingest → triage/draft streams in → approve. ✅
  • Phase 5 — Eval harness & CI gate: make eval runs the real agent graph over the held-out golden set and scores routing (category), retrieval recall@k, groundedness, LLM-judge answer quality, safety (zero forbidden actions), and cost/latency — writing pipeline/eval/reports/REPORT.md and an eval_runs row. Gated on the PRD §4 numbers (groundedness ≥90%, routing ≥85%, safety 0); the run exits non-zero otherwise. GitHub Actions CI runs Go/Python/web tests, builds all images, and runs the sampled eval gate. ✅
  • Phase 6 — P1 enhancements: hybrid retrieval (pgvector + Postgres FTS → Reciprocal Rank Fusion → lexical rerank); read-only, allow-listed, audited MCP tools wired into drafting; priority queue ordering (urgency/sentiment, composite-cursor pagination); a quality dashboard (auto-draft/escalation rates, cost, p95, eval trend); hosted/local LLM provider switch via env; and OpenTelemetry tracing end-to-end (the API trace id propagates onto the event so the worker continues the same trace). ✅

License

MIT — see LICENSE.

About

Event-driven AI customer-support copilot — LangGraph agent (triage → RAG → draft → guard → escalate), Go GraphQL API, pgvector, Redis Streams, Next.js console. Runs 100% locally.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors