This is a classic system design question that tests your ability to handle real-time data at massive scale. Live streaming sits at the intersection of storage, networking, and distributed systems — and interviewers love it because there are so many interesting trade-offs to discuss.
Estimated time: 45 minutes
Step 1: Understand the Requirements (5 minutes)
Before you touch a single architecture box, you need to nail down what "live streaming" actually means for this system. Ask your interviewer clarifying questions — it shows maturity and prevents you from designing the wrong thing.
Functional Requirements
-
Broadcasters
can start a live stream and upload video in real time
-
Viewers
can watch live streams with near-real-time latency (a few seconds is acceptable)
-
Live chat
alongside each stream so viewers can interact
-
Viewers can browse and discover active streams
-
Support for
stream recording
so content is available as VOD (video on demand) after the stream ends
-
Multiple quality levels so viewers on slow connections can still watch
Non-Functional Requirements
-
Support
millions of concurrent viewers
on a single popular stream
-
End-to-end latency under
10 seconds
from broadcaster to viewer
-
High availability
— streams should not drop mid-broadcast
-
The system should scale horizontally as viewership grows
-
Chat should feel real-time (sub-second delivery for most users)
The key insight here is that live streaming is fundamentally a write-once, read-millions problem. One broadcaster produces content that millions consume simultaneously.
Out of Scope
-
Payment processing and subscription management (mention it, don't design it)
-
Content moderation ML pipelines
-
Mobile app specifics
-
Authentication and authorization details
Step 2: Back-of-the-Envelope Estimation (5 minutes)
Let's get a feel for the numbers. This helps you make informed decisions about architecture later.
Traffic Estimates
Daily active users: 50 million. Concurrent viewers at peak: 10 million. Concurrent broadcasters at peak: 100,000. Average stream duration: 2 hours. Average viewers per stream: 100 (but top streams have millions).
Bandwidth Estimates
Video is a bandwidth monster. 1080p runs at 6 Mbps, 720p at 3 Mbps, 480p at 1.5 Mbps, 360p at 0.8 Mbps.
Ingest bandwidth (broadcaster to our servers): 100,000 broadcasters x 6 Mbps = 600 Gbps ingest.
Egress bandwidth (our servers to viewers): 10 million viewers x 3 Mbps (average) = 30 Tbps egress.
That 30 Tbps number is exactly why CDNs exist. No single data center can push that much traffic.
Storage Estimates
If we record all streams: 100,000 streams x 2 hours x 6 Mbps = 540 TB per day of raw video. After transcoding to multiple qualities, roughly 2x that = ~1 PB/day. With 30-day retention for VOD: ~30 PB.
Step 3: High-Level Design (10 minutes)
The system is a pipeline: video flows from the broadcaster through several stages before reaching viewers.
Core Components
-
Video Ingestion Service - Accepts live video from broadcasters via RTMP
-
Transcoding Service - Converts incoming video into multiple resolutions and bitrates
-
CDN - Distributes video chunks to edge servers close to viewers
-
Chat Service - Handles real-time messaging for each stream
-
Stream Discovery and Metadata Service - Manages stream listings and search
-
VOD Service - Records live streams for later playback
Step 4: Deep Dive (20 minutes)
The Video Pipeline
RTMP ingest, chunking into 2-6 second segments, transcoding to multiple renditions, and delivery via HLS adaptive bitrate streaming.
CDN Architecture
Tiered cache architecture with shield/mid-tier layer between origin and edge. Multi-CDN strategy for redundancy and global coverage.
Live Chat at Scale
WebSocket connections with fan-out strategies that scale from direct delivery to message sampling based on viewer count.
Stream Recording and VOD
Asynchronous pipeline that stitches chunks, re-transcodes, generates thumbnails, and stores with lifecycle policies.
Wrap Up
Key trade-offs: latency vs quality, consistency vs availability, cost vs performance. The interviewer wants to see you navigate a complex system with many moving parts and make reasonable decisions at each layer.