System Design: Netflix‑Like Video Streaming Platform
Context
Design a large-scale video streaming platform that supports web, mobile, and TV clients for millions of concurrent viewers worldwide. The system should prioritize availability, low-latency playback, security, and cost efficiency.
Assume:
-
Peak: 5–10 million concurrent streams globally.
-
Target startup latency < 2 seconds; low rebuffering; 99.9%+ availability for playback.
-
Mix of on-demand VOD content; live not required (optional considerations welcome).
Requirements
Design the system and cover the following components:
-
User Authentication and Profiles
-
Sign-in, sessions, MFA, device limits, parental controls, multiple profiles per account.
-
Content Catalog and Metadata
-
Titles, seasons/episodes, genres/tags, availability windows, rights/regions, subtitles/audio tracks.
-
Search and Personalized Recommendations
-
Full-text search, browse pages, personalized rows, cold-start handling.
-
Video Ingestion and Transcoding Pipeline
-
Secure uploader, QC, per-title encoding, thumbnails, captions, packaging to HLS/DASH.
-
DRM and Adaptive Bitrate Streaming
-
Widevine/FairPlay/PlayReady, license services, CMAF, HLS/DASH manifests, LL-HLS optional.
-
CDN and Edge Caching Strategy
-
Multi-CDN, edge authorization, origin shielding, cache warming/purges.
-
Storage and Geo-Replication
-
Masters vs renditions, object storage layout, lifecycle tiers, cross-region replication.
-
Playback Session Management
-
Session creation, concurrency enforcement, heartbeats, QoE metrics, resume playback.
-
APIs and Service Discovery
-
External API gateway, internal gRPC, schema, rate limits, idempotency, service mesh.
-
Multi-Region Failover and Data Consistency
-
Active-active vs active-passive, routing, data models with consistency choices.
-
Observability and A/B Testing
-
Metrics/traces/logs, QoE SLOs, experimentation platform, guardrails.
-
Cost Optimization
-
Encoding ladder strategy, codec choices, storage lifecycle, CDN offload, compute efficiency.
Deliver a high-level architecture and justify key trade-offs. Include back-of-the-envelope capacity estimates and call out pitfalls and edge cases.