System Design: ChatGPT‑Style Homepage with Streaming
Goal
Design a ChatGPT‑style web homepage end to end. Users should type a prompt and see the model’s response stream token‑by‑token in the browser.
Requirements
-
Frontend
-
Render a chat UI (messages, input box, streaming cursor, retry/stop, multi‑tab resilience).
-
Stream tokens to the UI with low latency and graceful reconnection.
-
Persist conversations and support pagination/search.
-
Backend
-
Provide a server endpoint that calls a Chat Completions API with streaming.
-
Authenticate users and protect provider credentials.
-
Rate limit users and enforce token/concurrency budgets.
-
Store conversation state (messages, metadata, token counts).
-
Stream tokens to the browser (SSE or WebSockets) with backpressure, retries, and timeouts.
-
Log, metric, and trace requests end‑to‑end; scale under load with cost controls.
Streaming Transport Comparison
Describe when to use SSE vs WebSockets for streaming tokens, including trade‑offs in:
-
Latency
-
Reliability and ordering
-
Backpressure handling
-
Reconnection semantics
-
Browser/proxy support and operational complexity
Integration Details
Show how you would integrate a Chat Completions API, including:
-
Authentication (user identity and server‑to‑provider secrets)
-
Rate limiting (per user/IP, concurrency, token budgets)
-
Conversation state storage (schema, summarization, limits)
-
Streaming tokenization path (upstream to backend to browser)
-
Error handling and retries (transient vs permanent)
-
Observability (logs, metrics, traces)
-
Scalability and cost considerations
Provide concrete architectural choices, sequence of events, and concise code snippets or pseudocode to illustrate the streaming path for both SSE and WebSockets.