Design a Direct Messaging System for an E-commerce Marketplace
Company: Whatnot
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Technical Screen
# Design a Direct Messaging System for an E-commerce Marketplace
Design a **direct messaging (DM)** system for an e-commerce marketplace. Buyers and sellers need to chat directly with one another (one-to-one conversations) about listings, orders, shipping, and returns. Either side can start a conversation, and either side can be online or offline at any time.
The product is **web-based only** (there is no native mobile app), but a single user may have the site open in **multiple browser tabs or on multiple computers at the same time**, and may log in from a new device at any moment.
Concretely, the system must satisfy:
- **Real-time delivery.** When the recipient is online, a message should reach them as quickly as possible (low end-to-end latency).
- **Offline notifications.** When the recipient has no active session, notify them out-of-band via **SMS or email** so they know a message is waiting.
- **Cross-device consistency / durability.** Message history must be durable and identical across all of a user's sessions. A message received or read on one device must **not disappear or be lost** on another device, including a device the user logs into later.
- **Resilience to mid-delivery disconnects.** If a recipient's device goes offline **while a message is being delivered**, the message must be queued and delivered when that device reconnects — never silently dropped.
- **Simultaneous multi-device reads.** The same user reading the same conversation from two devices at once must see a consistent view (ordering, read state) without duplicating or losing messages.
```hint Transport
A web client needs a live server-to-client push channel. Think about a persistent connection (WebSocket, or SSE / long-poll as a fallback) terminated at a stateful "connection gateway" layer that tracks, per user, which session connections are currently open.
```
```hint Source of truth vs. delivery
Separate the **durable, ordered message store** (the source of truth) from the **fan-out / push path**. Assign each message a monotonically increasing per-conversation sequence number on write; clients then sync by a cursor over that sequence. This is what makes multi-device consistency and "deliver on reconnect" fall out naturally.
```
```hint Offline and reliability
Decide delivery semantics first (at-least-once + client-side dedup by message id is the usual choice). Offline notification is driven by "recipient has no live connection" — route to a notification service that sends SMS/email, while the message itself waits durably in the store and is pulled on reconnect via the cursor.
```
### Constraints & Assumptions
Confirm with the interviewer; the following are reasonable working assumptions for sizing:
- ~50M registered users, ~5M daily active users; a user has at most a handful of concurrent sessions.
- Mostly 1:1 conversations (buyer <-> seller). Group chat is out of scope for v1.
- Write volume on the order of ~10k messages/sec average, with peaks several times higher (e.g., promotions, holiday sales).
- Messages are short text (plus optional small attachments / links to listings); assume a few hundred bytes to a few KB each.
- Target online delivery latency: a few hundred milliseconds (p99) end-to-end. Offline notification latency of seconds is acceptable.
- Message history is retained long-term and must be searchable/scrollable per conversation.
- Reasonable abuse/spam controls exist but the deep anti-abuse design is out of scope.
### Clarifying Questions to Ask
- Is this strictly **1:1**, or do we eventually need group/multi-party threads (e.g., a buyer plus a support agent)? Does that change the data model now?
- What ordering and consistency guarantee is required — strict per-conversation ordering, and is "read your own writes" across your own devices a hard requirement?
- What delivery semantics are acceptable: **at-least-once with dedup**, or is exactly-once expected? Is message loss ever tolerable?
- What are the latency / availability targets (p99 delivery latency, uptime SLA), and what is the expected peak message rate?
- Do we need typing indicators, delivery/read receipts, presence (online/offline), and editing/deletion of messages — or just send/receive + history for v1?
- What are the retention, compliance, and privacy requirements (how long do we keep history, encryption at rest/in transit, regional data residency)?
- Are attachments / images in scope, and if so what size limits and storage (e.g., object store + CDN)?
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- A device disconnects **in the middle of receiving** a message and the server has already marked it sent. Walk through exactly how the client recovers the missed message on reconnect with no loss and no duplicate.
- Two devices of the same user read the conversation simultaneously and both advance the read cursor. How do you keep read state consistent and avoid lost updates?
- How do you guarantee **strict per-conversation ordering** under concurrent sends and a sharded store? What breaks if two app servers assign sequence numbers for the same conversation at once?
- How would you extend the design to support **group conversations** (3+ participants) without rewriting the core delivery path?
- How do you make offline SMS/email notifications **idempotent and debounced** so a user who is offline for an hour with 20 incoming messages doesn't get 20 texts?
Quick Answer: This question evaluates a candidate's ability to design a real-time messaging system, covering distributed systems concepts like durability, ordering guarantees, and cross-device consistency. It is commonly used in system design interviews to assess how well someone reasons about delivery semantics, offline notification handling, and reconnection recovery at a practical, architectural level.