Build a Reliable Streaming Chat UI
Company: OpenAI
Role: Software Engineer
Category: Software Engineering Fundamentals
Difficulty: hard
Interview Round: HR Screen
You are building a React-based chat interface where assistant responses stream into the UI **token by token** in real time (the same UX as a typical LLM chat product). The product allows a user to send a new message while a previous response is still streaming, and may show more than one conversation thread.
Before writing any code, walk the interviewer through how you would **design and build this feature**. Treat it as a verbal design discussion: explain your architecture, the React state model, and how you keep the UI correct and smooth under concurrency and failure. Your answer should address:
- The **architecture and data flow** for live updates (transport, identifiers, how a chunk reaches the right message).
- **Two pieces of state that exist only while a stream is active** and must be cleaned up when it finishes.
- How to manage React state so streamed tokens **do not flicker or overwrite** existing content.
- What can **go wrong when multiple responses stream at once**, and how to **prevent race conditions** between concurrent streams.
- When you would reach for **`useRef`** versus **`useState`** in this scenario.
- How you would keep the feature **reliable**: if the same network request is accidentally sent twice (double-submit, retry, reconnect), how do you prevent it from being processed/rendered twice?
```hint Where to start
Frame the problem around **identity and ownership of a chunk**. Before any React detail, decide: what is the transport (SSE / WebSocket / streamed `fetch` `ReadableStream`), and what stable IDs (`conversationId`, `messageId`, `streamId`, optional `seq`) ride on every chunk so it can always be routed to the correct message.
```
```hint React state model
Separate **render state** (the message list and visible partial text) from **operational bookkeeping** that should NOT cause a re-render on every change. Think about updating messages by a stable `messageId` with a *functional* updater, and whether a `useReducer` over stream events (`STREAM_STARTED` / `CHUNK_RECEIVED` / `STREAM_COMPLETED` / `STREAM_FAILED`) is cleaner than ad-hoc `setState` calls from inside async callbacks.
```
```hint Concurrency & stale updates
The classic bug: an async callback closes over a **stale `messages`** snapshot, or a slow old stream writes after a newer one started. Consider per-stream metadata keyed by `streamId`, ignoring chunks whose stream is no longer active, and a "latest request id" guard to drop responses you no longer care about.
```
```hint Idempotency
For the double-send case, think about a client-generated **idempotency key** (request UUID) sent with the request, plus server-side dedup, and a client-side guard so a duplicate response can't create a second assistant message.
```
### Constraints & Assumptions
- Frontend is **React** (function components + hooks); assume a modern React (18+) with concurrent rendering and `StrictMode` double-invocation of effects in development.
- Responses arrive as a sequence of small text deltas; a single response can be hundreds to thousands of tokens, so naive "re-render per token" must be considered for cost.
- The user can **start a new message before the previous stream finishes**, and can **cancel/stop** a streaming response.
- The network is unreliable: requests can time out, the connection can drop mid-stream, and a retry or React StrictMode remount can cause the **same request to be sent twice**.
- Assume a backend exists that can stream a response; you are not designing the model, only the client (you may state minimal contract requirements you need from the server).
### Clarifying Questions to Ask
- What is the streaming **transport** — Server-Sent Events, WebSocket, or a streamed HTTP body via `fetch`? Is it fixed or my choice?
- Can a user have **multiple responses streaming concurrently** (e.g. across threads), or is it strictly one active response per conversation?
- On a new submit mid-stream, should the previous response be **cancelled/superseded** or allowed to finish in the background?
- Does the server guarantee **in-order delivery** of chunks, or do I need a sequence number to reorder?
- What does the server send to mark **completion vs. error**, and does it echo back the `messageId`/`streamId` and any sequence numbers I send?
- Is there a **server-side idempotency/dedup** mechanism I can rely on, or must the client be the sole defense against double-processing?
### What a Strong Answer Covers
- **Transport & contract:** names a concrete transport and the per-chunk contract (stable IDs, completion/error markers, optional `seq`); routes chunks by **`messageId`**, never by array index.
- **Lifecycle of a message:** optimistic assistant message on send → append deltas as chunks arrive → mark complete/error on the terminal event; explicit handling of cancel, timeout, and mid-stream disconnect.
- **Ephemeral vs durable state:** correctly identifies stream-only state (e.g. `AbortController`, token buffer, expected `seq`, `isStreaming`/`activeStreamIds`) and explains *when* it is torn down.
- **Flicker-free rendering:** functional `setState`/`useReducer` updates, appending deltas (not replacing from a stale snapshot), and buffering/flushing on `requestAnimationFrame` or an interval to avoid a render per token.
- **Concurrency correctness:** enumerates the real failure modes (out-of-order chunks, slow old stream overwriting a newer one, wrong-message writes, cancelling the wrong request) and concrete mitigations (per-`streamId` map, ignore-inactive-stream guard, latest-request-id check, supersede-on-new-send when single-active).
- **`useRef` vs `useState` reasoning:** a clear rule — render-affecting data in state/reducer, mutable operational handles (controllers, registries, buffers, timers, socket/EventSource) in refs to avoid re-renders and stale closures.
- **Reliability / idempotency:** client-generated idempotency key, server dedup, and a client guard so a duplicated request never produces a duplicate assistant message; awareness of StrictMode double-effect as a *source* of accidental double-sends.
### Follow-up Questions
- React 18 `StrictMode` mounts effects twice in development. How does that interact with opening a stream in `useEffect`, and how do you make stream setup/teardown idempotent so you don't open two connections?
- The connection drops at token 400 of an 800-token response. How do you recover — resume from an offset, restart, or surface a partial message — and what does the server contract need to support your choice?
- Rendering every token re-renders a long message list and tanks performance. Walk through how you'd diagnose this and the specific techniques (buffering, memoization, virtualization, isolating the streaming message) you'd apply.
- You decide to move the whole streaming state machine out of React component state. Where would it live (a store / external state container / custom hook), and what do you gain and lose versus keeping it in `useReducer`?
Quick Answer: This question evaluates understanding of real-time front-end architecture and state management in React, focusing on transient streaming state, concurrency control, UI consistency during tokenized updates, and the appropriate use of hooks such as useRef versus useState in the Software Engineering Fundamentals domain.