In this assignment, you will build an Agent Console a Next.js application that connects to a provided mock AI agent backend over WebSockets, renders streaming responses with mid-stream tool call interruptions, displays a live agent trace timeline, and survives the backend's chaos mode without crashing or losing state.
The backend (agent-server) is provided as a Docker container. You do not modify it. It speaks a documented WebSocket protocol, simulates a context-aware AI agent that streams responses, makes tool calls, retrieves context, and when chaos mode is enabled drops connections, reorders messages, injects latency spikes, and sends malformed heartbeats. Your job is to build a frontend that handles all of it gracefully.
This is not a chat UI exercise. It is a systems exercise that happens to have a frontend. You will be evaluated on how your application behaves under stress, not how it looks in a screenshot.
At Alchemyst AI, the frontend is the last mile between a context-aware AI agent and a paying client. If the agent streams a response and the UI jitters, the client sees a broken product. If a tool call happens mid-stream and the message reflows, the client loses trust. If the WebSocket drops and the reconnection silently loses three messages, the client sees an incoherent response and blames the AI.
We need engineers who understand that real-time AI interfaces are a distributed systems problem with a render loop attached.
- Docker installed and running (the agent-server ships as a container).
- Node.js 20+ and a package manager of your choice (
npm,pnpm,yarn). - Read the Protocol Reference (Section below) and the
agent-server/README.mdend-to-end before writing any code. Understand every message type and its sequence number semantics before you touch a WebSocket. - Run the agent-server in
--mode normalfirst. Watch the raw WebSocket frames in your browser's Network tab. Only then start building.
The agent-server directory contains a Dockerised WebSocket server that simulates a context-aware AI agent. Run it as:
docker build -t agent-server ./agent-server
docker run -p 4747:4747 agent-server # normal mode
docker run -p 4747:4747 agent-server --mode chaos # chaos modeThe server exposes:
ws://localhost:4747/wsthe main WebSocket endpointGET http://localhost:4747/healthhealthcheckGET http://localhost:4747/logreturns a JSON array of every client-side event the server recorded during the session (heartbeat responses, RESUME messages, acknowledgements). This is how we verify your client's protocol compliance.
You send a user message; the agent responds by streaming tokens, optionally making tool calls mid-stream, and periodically broadcasting context snapshots. Details in the Protocol Reference below.
Every WebSocket message is a JSON object with a type field and a monotonically increasing seq (sequence number). The seq is critical it is how the client tracks what it has received and how state recovery works after reconnection.
| Type | Fields | Description |
|---|---|---|
USER_MESSAGE |
content: string |
Send a user message to the agent. |
PONG |
echo: string |
Response to a server PING. Must echo the challenge field from the PING, verbatim. |
RESUME |
last_seq: number |
Sent immediately upon reconnection. Tells the server the last seq the client successfully processed. The server replays all events after that seq. |
TOOL_ACK |
call_id: string |
Acknowledges that the client has rendered a tool call card. The server waits for this before sending TOOL_RESULT. If not received within 5 seconds, the server logs a protocol violation and sends the result anyway. |
| Type | Fields | Description |
|---|---|---|
TOKEN |
seq, text: string, stream_id: string |
A chunk of the agent's streaming response. Tokens arrive roughly every 30–80ms. stream_id groups tokens belonging to the same response. |
TOOL_CALL |
seq, call_id: string, tool_name: string, args: object, stream_id: string |
The agent is invoking a tool mid-stream. The current token stream is paused (no more TOKEN events for this stream_id until TOOL_RESULT). |
TOOL_RESULT |
seq, call_id: string, result: object, stream_id: string |
The tool returned a result. Token streaming for the stream_id resumes after this event. |
CONTEXT_SNAPSHOT |
seq, context_id: string, data: object |
A snapshot of what data the agent is currently working with. Sent at the start of a response and whenever the agent's context changes mid-response. |
PING |
seq, challenge: string |
Heartbeat. Client must reply with a PONG containing the exact challenge string within 3 seconds. Three missed PONGs = server terminates the connection. |
STREAM_END |
seq, stream_id: string |
The agent has finished its response for this stream_id. |
ERROR |
seq, code: string, message: string |
A server-side error. May arrive at any point. |
- Every server message has a
seq. Sequence numbers are globally ordered and gapless in normal mode. - The client must track the highest
seqit has fully processed (rendered to the DOM, not just received). - On reconnection, the client sends
RESUMEwithlast_seqset to the highest fully-processedseq. The server replays everything after it. - In chaos mode, the server may send messages with out-of-order
seqvalues. The client must buffer and reorder before processing. - Duplicate
seqvalues are possible in chaos mode. The client must deduplicate.
When the server runs with --mode chaos, it randomly introduces:
| Behaviour | What happens |
|---|---|
| Connection drop | Server kills the WebSocket mid-stream with no close frame. Can happen at any point. |
| Latency spike | Token delivery pauses for 2–8 seconds, then resumes in a burst. |
| Out-of-order delivery | Messages arrive with seq values that are not sequential. |
| Duplicate messages | The same seq is sent twice. |
| Rapid tool calls | Two TOOL_CALL events in quick succession for the same stream_id before any TOOL_RESULT. |
| Corrupt heartbeat | A PING arrives with an empty challenge field. Client must handle without crashing. |
| Oversized context | A CONTEXT_SNAPSHOT with a data payload exceeding 500KB. |
Build the core chat interface. A user types a message, the agent streams a response token by token, and tool calls interrupt the stream mid-sentence.
Specific requirements:
- Tokens must render incrementally as they arrive, not batched into paragraphs after the stream ends.
- When a
TOOL_CALLevent arrives mid-stream, the in-progress text must freeze in place no flicker, no reflow, no layout shift. A tool call card must appear below the frozen text showing the tool name and arguments. - The client must send a
TOOL_ACKfor eachTOOL_CALLwithin 2 seconds. - When
TOOL_RESULTarrives, the tool call card must update to show the result, and token streaming must resume from exactly where it paused. The user must not perceive a gap or duplicate text. - Multiple sequential tool calls (tool call → result → tool call → result → resume) must render as a stacked sequence, not overwrite each other.
Why this is hard: Getting the interleave right freezing the stream at the exact token boundary, rendering the card, resuming without duplication requires a state machine, not a useEffect. Most AI-generated WebSocket chat code will fail here because it assumes a linear stream.
Build a collapsible side panel that shows every protocol event in real time as the agent works.
Specific requirements:
- Every event (
TOKEN,TOOL_CALL,TOOL_RESULT,CONTEXT_SNAPSHOT,PING/PONG,ERROR) must appear as a row in a scrollable, auto-updating timeline. TOKENevents should be grouped: instead of one row per token, batch consecutive tokens into a single expandable row that shows "Streamed 47 tokens (1.2s)" with the full text visible on expand.TOOL_CALLandTOOL_RESULTrows must be visually linked (samecall_id, connected by a line or indent).- Clicking any row should highlight the corresponding element in the chat panel (the text chunk, the tool card, etc.). This is bidirectional clicking a tool card in the chat should scroll the timeline to its
TOOL_CALLentry. - The timeline must not cause visible jank when events are arriving at 30+ per second (token streaming rate). If you are re-rendering the full list on every token, that is wrong.
- Include a filter bar: filter by event type, search by content.
Build a context panel that shows what data the agent is currently operating on, with diffs.
Specific requirements:
- When a
CONTEXT_SNAPSHOTevent arrives, display thedataobject in a readable, syntax-highlighted tree view. - When a subsequent
CONTEXT_SNAPSHOTarrives (samecontext_id), compute and display the diff: what keys were added, removed, or changed. Highlight changes visually in the tree. - For large context objects (500KB+ in chaos mode), the tree must remain interactive no freezing the tab. Consider virtualisation or lazy expansion.
- Include a "history" scrubber: the user can step backward and forward through the sequence of snapshots for a given
context_idand see the diff at each step.
Why this is hard: Diffing arbitrary nested JSON performantly, rendering it as a navigable tree, and keeping it responsive when the payload is half a megabyte this requires understanding both algorithms and the DOM.
Implement connection lifecycle management that makes drops invisible to the user.
Specific requirements:
- When the WebSocket connection drops, show a non-blocking reconnection indicator within 500ms. The chat panel must remain interactive (the user can scroll, copy text, read).
- Reconnection attempts must use exponential backoff: 500ms, 1s, 2s, 4s, capped at 10s.
- Upon successful reconnection, the client must send a
RESUMEmessage containing thelast_seqit fully processed. This must happen as the first message on the new connection. - When the server replays missed events, the client must process them in
seqorder, deduplicate any that were already processed, and stitch them into the existing DOM state without visible jumps. - If the connection drops mid-tool-call (after
TOOL_CALLbut beforeTOOL_RESULT), the tool card must remain visible with a "waiting" state, and the result must render correctly when the replayed events include theTOOL_RESULT. - Heartbeat management: respond to every
PINGwith aPONGwithin 3 seconds. Handle corrupt PINGs (emptychallenge) without crashing.
Why this is hard: Reconnection that actually works not just reconnects but recovers state requires tracking what the DOM has consumed, not just what the socket has received. Most reconnection code in tutorials handles the "reconnect" part but destroys the "recovery" part.
Enable chaos mode (--mode chaos) and record your screen (3–5 minutes) showing your application handling the following scenarios, labelling each as it happens:
- Connection drop mid-stream: The agent is streaming tokens, the connection dies, your app reconnects, and the response continues seamlessly.
- Out-of-order messages: Tokens arrive with shuffled
seqvalues. Your app reorders them and renders the text correctly. - Rapid tool calls: Two tool calls fire in quick succession. Both cards appear, both results land, and streaming resumes without duplication.
- Oversized context snapshot: A 500KB+ context snapshot arrives. The context panel renders without freezing the chat.
- Corrupt heartbeat: A PING with an empty challenge arrives. Your app does not crash or disconnect.
This recording is mandatory. A submission without it will be treated as incomplete.
- Framework: Next.js 14+ (App Router). No Pages Router.
- Language: TypeScript in strict mode (
"strict": truein tsconfig). Noanytypes outside a single, clearly documented escape hatch file. No@ts-ignore. - Styling: Your choice (Tailwind, CSS Modules, vanilla CSS), but the app must be usable, not necessarily beautiful. Function over aesthetics we want correct rendering, not gradients.
- State management: Your choice, but document why. If you pick Redux for a WebSocket app and can justify it, fine. If you pick
useStateand it holds up under chaos, also fine. We will read your rationale. - No AI chat component libraries. No
aiSDK streaming helpers, novercel/ai, nolangchainfrontend packages. You are building the streaming renderer from scratch. That is the point.
Submit a repository (public Git repo or tarball) containing:
-
Your Next.js application fully buildable with
npm install && npm run build && npm run start. No manual steps. No missing env vars. If it does not build on the first try, that is a signal. -
A
README.mdwith:- A 2–3 sentence summary of your architectural approach.
- A state machine diagram (ASCII, Mermaid, or image) showing your WebSocket connection states and transitions (connected → streaming → tool_call_pending → reconnecting → resuming, etc.).
- Instructions to run the app against the agent-server.
- Screenshots of the app in normal mode showing: (a) a streamed response with a tool call, (b) the trace timeline, (c) the context inspector showing a diff.
-
The chaos mode screen recording (Task 5). Upload to YouTube (unlisted), Loom, or include as an
.mp4in the repo. -
A
DECISIONS.mdfile (1–2 pages) covering:- Your approach to
seq-based ordering and deduplication. What data structure did you use and why? - How you prevent layout shift during tool call interruptions. What CSS or rendering strategy?
- Your reconnection state recovery approach. How do you track what the DOM has "consumed" vs. what the socket has "received"?
- What you would change if this needed to handle 50 concurrent agent streams on one screen (an "operations dashboard" scenario).
- What you would change if the agent's responses were 100x longer (think: full document generation, not chat).
- Your approach to
| Criteria | Weight | What we are checking |
|---|---|---|
| Protocol compliance | 25% | We run the agent-server with logging (/log endpoint) and check: Are PONG responses timely? Are TOOL_ACK messages sent? Is the RESUME message correct after drops? Does the client deduplicate? |
| Chaos survival | 25% | We watch your recording and run chaos mode ourselves. Does the app crash? Does the DOM enter an inconsistent state? Are messages lost? Does the UI freeze? |
| Streaming fidelity | 20% | Token rendering is smooth and incremental. Tool calls interrupt and resume without duplication, reflow, or flicker. The final rendered text matches what the server sent (we compare against server logs). |
| Code quality | 15% | TypeScript strictness. State machine clarity. No useEffect spaghetti. Clear separation between protocol handling and rendering. Tests for non-trivial logic (the reordering buffer, the diff engine). |
| Architectural judgment | 15% | The DECISIONS.md shows you understand why you made your choices, not just what you built. Bonus points for identifying failure modes we didn't ask about. |
Note: Incomplete submissions are accepted if your
DECISIONS.mddocuments what you attempted, what failed, and what you would do differently. We would rather see a well-documented partial implementation than a fully working app you do not understand.
To be transparent about what we filter on:
- AI-generated code with no understanding. We will ask you to walk through your WebSocket state machine in a follow-up call. If you cannot explain why your reconnection logic sends
RESUMEbefore processing buffered events, that is a problem. - No screen recording. This is not optional. Code that compiles is not the same as code that works.
anytypes sprinkled throughout. This tells us you fought the type system instead of designing with it.- The app works in normal mode but crashes in chaos mode. Normal mode is the tutorial. Chaos mode is the job.
Also being transparent here:
- A reconnection sequence so smooth we have to check the logs to confirm the drop actually happened.
- A trace timeline that we instinctively start using as a debugging tool while evaluating your submission.
- A
DECISIONS.mdthat identifies a failure mode in the protocol itself (there is at least one theTOOL_ACKtimeout creates a race condition; if you spot it and document it, that is a strong signal). - Unit tests for your reordering buffer with edge cases (empty buffer, single element, duplicates, fully reversed sequence).
Implementation should take approximately 4–5 days. The scope is intentionally larger than what you might finish we want to see what you prioritise.
Final Deadline: To Be Announced
Email your repo link (or tarball) to anuran@getalchemystai.com with the subject line:
Full Stack AI Engineer Assignment <Your Name>
CC: vedanta@getalchemystai.com and khushi@getalchemystai.com.
Include the link to your chaos mode screen recording in the email body.
CLIENT → { "type": "USER_MESSAGE", "content": "Summarise the Q3 report" }
SERVER ← { "type": "CONTEXT_SNAPSHOT", "seq": 1, "context_id": "ctx_01", "data": { "report": "Q3-2025", "pages": 47, "sections": ["revenue", "ops", "forecast"] } }
SERVER ← { "type": "TOKEN", "seq": 2, "stream_id": "s_01", "text": "Based on " }
SERVER ← { "type": "TOKEN", "seq": 3, "stream_id": "s_01", "text": "the Q3 report, " }
SERVER ← { "type": "TOKEN", "seq": 4, "stream_id": "s_01", "text": "revenue grew " }
SERVER ← { "type": "TOOL_CALL", "seq": 5, "call_id": "tc_01", "tool_name": "lookup_metric", "args": { "metric": "revenue_yoy" }, "stream_id": "s_01" }
-- stream paused, client renders tool card --
CLIENT → { "type": "TOOL_ACK", "call_id": "tc_01" }
SERVER ← { "type": "TOOL_RESULT", "seq": 6, "call_id": "tc_01", "result": { "value": "23.4%", "period": "YoY" }, "stream_id": "s_01" }
-- stream resumes --
SERVER ← { "type": "TOKEN", "seq": 7, "stream_id": "s_01", "text": "23.4% year-over-year" }
SERVER ← { "type": "TOKEN", "seq": 8, "stream_id": "s_01", "text": ", driven primarily by..." }
...
SERVER ← { "type": "STREAM_END", "seq": 42, "stream_id": "s_01" }
-- during all of the above, heartbeats are interleaved --
SERVER ← { "type": "PING", "seq": 15, "challenge": "a1b2c3" }
CLIENT → { "type": "PONG", "echo": "a1b2c3" }
# 1. Start the mock agent backend
cd agent-server
docker build -t agent-server .
docker run -p 4747:4747 agent-server
# 2. Build your Next.js application against ws://localhost:4747/ws
# 3. Test with chaos mode
docker run -p 4747:4747 agent-server --mode chaosRead the full assignment document before starting. Understand the protocol before writing code.
Wishing everyone who is solving this - best of luck!
- Vedanta Banerjee SWE & TPM, Alchemyst Labs