Real-Time Systems, WebSockets, and Long-Lived Connections

What's being tested

Interviewers are probing whether you can design stateful, low-latency distributed systems where clients maintain long-lived connections instead of issuing independent request/response calls. Google cares because many systems—chat, collaborative editing, multiplayer presence, notifications, live dashboards—require correctness under failures, high fanout, uneven load, mobile networks, and regional outages. A strong Software Engineer answer should show you understand connection management, message delivery semantics, backpressure, pub/sub routing, load balancing, and how to reason about `p50`, `p99`, capacity, and failure recovery. The key is not just “use WebSockets,” but explaining what breaks when millions of clients stay connected for hours.

Core knowledge

WebSocket is a full-duplex protocol upgraded from HTTP using Upgrade: websocket, standardized in RFC 6455. It works well for bidirectional, low-latency traffic such as chat, presence, games, and collaborative editing, but requires servers to hold connection state.
Server-Sent Events are simpler than WebSockets when updates are one-way server-to-client. SSE uses regular HTTP streaming, handles automatic reconnects, and is easier to proxy, but does not support true bidirectional messaging without a separate HTTP write path.
Long polling is a fallback for restrictive networks or legacy clients. The client issues a request that the server holds until data arrives or a timeout occurs. It is easier to deploy than WebSockets but creates more HTTP request overhead and worse tail latency under high update rates.
Connection capacity is often bounded by file descriptors, memory per connection, kernel buffers, TLS state, and heartbeat traffic. Estimate roughly as: connections per node $\approx \frac{\text{available memory}}{\text{memory per connection}}$ , then separately validate CPU, network, and `p99` event-loop latency.
Load balancing is harder for long-lived connections because traffic does not naturally rebalance after clients connect. Use L4 load balancing for raw TCP efficiency or L7 for routing features; plan for draining, reconnect storms, sticky sessions, and uneven shard load.
Stateful gateway servers terminate WebSockets and maintain connection registries such as user_id -> connection_id -> gateway_id. Gateways should be horizontally scalable and keep durable application state elsewhere, typically in a database, cache, or log-backed messaging layer.
Pub/sub decouples message producers from connected clients. Common patterns include Redis Pub/Sub for smaller systems, Kafka or Pulsar for durable streams, and custom fanout services for very large-scale notification delivery. The tradeoff is latency versus durability and replay.
Delivery semantics must be explicit: at-most-once may drop messages, at-least-once may duplicate them, and exactly-once is usually approximated with idempotency, sequence numbers, and deduplication. For chat, use message IDs and client-side ACKs rather than promising perfect delivery.
Ordering is usually guaranteed only within a partition, room, document, or conversation. If messages are routed through multiple shards, include monotonic sequence numbers per stream. Global total ordering is expensive and rarely necessary; prefer scoped ordering with clear invariants.
Heartbeats detect dead connections and keep NAT mappings alive. A common design uses ping/pong every 20–60 seconds, with disconnect after a few missed heartbeats. Too frequent heartbeats waste battery and bandwidth; too infrequent delays presence updates and resource cleanup.
Backpressure prevents slow clients from exhausting server memory. Track per-connection outbound queue size, drop low-priority events, compress batches, or disconnect clients that exceed thresholds. Never allow unbounded queues behind a slow mobile client.
Reconnect behavior is a first-class design concern. Clients should use exponential backoff with jitter, send the last received sequence number, and resume missed events if the server supports replay. Without jitter, a regional outage can cause a thundering herd of reconnects.

Worked example

For Design a Real-Time Chat System, a strong candidate would start by clarifying scope: one-on-one or group chat, expected concurrent connections, message retention, offline delivery, ordering requirements, and latency target such as “deliver to online recipients within 200 ms at `p99`.” The first framing decision is to separate connection gateways from message storage and fanout, so WebSocket servers remain replaceable rather than becoming the source of truth. The answer can be organized around four pillars: client connection lifecycle, message write path, online delivery path, and failure/retry behavior.

A reasonable skeleton is: clients connect to a WebSocket gateway; the gateway authenticates using a short-lived token; messages are written to durable storage with a unique message ID; a pub/sub layer routes the event to gateways holding recipient connections; gateways push messages to clients and receive ACKs. For group chat, the candidate should distinguish small groups, where fanout-on-write is simple, from huge rooms, where fanout-on-read or hierarchical fanout may be necessary. One explicit tradeoff to flag is using Kafka for durable ordered room streams versus Redis Pub/Sub for lower-latency but non-durable fanout. The former helps replay after reconnects; the latter is simpler but loses messages if a gateway is down unless storage and polling compensate.

The candidate should also mention presence as approximate rather than perfectly consistent: heartbeat-based online status can lag, and that is acceptable if documented. They should close by saying that, with more time, they would cover multi-region replication, abuse prevention, encryption, and detailed capacity math for concurrent connections, message rate, and fanout.

A second angle

For Design Real-Time Collaborative Editing, the same long-lived connection concepts apply, but the core challenge shifts from message delivery to concurrent mutation consistency. WebSockets still provide bidirectional low-latency transport, but each document needs an algorithm such as Operational Transformation or CRDTs to merge edits from multiple clients. Ordering and replay become stricter because applying edits out of order can corrupt document state, whereas chat can often tolerate minor delivery delays. Backpressure is also different: dropping a typing indicator is fine, but dropping an edit operation is not. A strong answer would separate reliable document operations from ephemeral signals like cursors, selections, and “user is typing.”

Common pitfalls

Pitfall: “Just use WebSockets and scale horizontally.”

This answer skips the main difficulty: WebSocket servers are stateful because they hold active connections. A better answer explains how gateways register connections, how messages find the right gateway, how reconnects work, and where durable state lives.

Pitfall: Overpromising exactly-once delivery.

In distributed systems, network retries, client reconnects, gateway crashes, and duplicate pub/sub delivery make true exactly-once extremely difficult. Say “at-least-once with idempotent message IDs and client ACKs” or “ordered per conversation using sequence numbers” instead of making a vague correctness claim.

Pitfall: Ignoring slow clients and reconnect storms.

Many designs work at average load but fail during mobile network drops, regional outages, or client bugs. Interviewers expect you to mention bounded outbound queues, heartbeat timeouts, exponential backoff with jitter, load-shedding, and graceful connection draining during deploys.

Connections

Interviewers may pivot from this topic into distributed consensus, event streaming, rate limiting, multi-region system design, or cache invalidation. It also connects naturally to API design tradeoffs among WebSocket, SSE, gRPC streaming, and plain HTTP polling.