Design a Cloud-Native, Horizontally Scalable, Fault-Tolerant Web Application for Millions of Users
Context
You are asked to design, deploy, and operate a cloud-based web application that serves millions of users. Assume a major cloud provider, but keep the design cloud-agnostic with examples as needed. Focus on justifications, trade-offs, and operational excellence.
Requirements
-
Compare cloud service models and usage:
-
Contrast IaaS, PaaS, and serverless, and explain when to use each.
-
Compute layer selection:
-
Choose between VMs, containers, and functions as the primary compute. Justify your choice and mention where the others still fit.
-
Traffic management and statelessness:
-
Describe load balancing, autoscaling policies, and stateless service design.
-
Data layer design:
-
Select data stores for different workloads (relational, NoSQL, object storage, in-memory cache). Explain consistency, durability, and backup/restore strategies.
-
Network and edge:
-
Outline VPC, subnets, security groups, firewalls/WAF, NAT, CDN, and apply zero-trust principles.
-
Security and secrets:
-
Plan IAM, secrets management, encryption in transit/at rest, and key rotation.
-
Availability and resilience:
-
Ensure high availability and disaster recovery (multi-AZ/region) with clear RTO/RPO targets.
-
Observability and cost:
-
Implement metrics, logs, traces, SLOs, and cost governance controls.
-
Delivery and operations:
-
Describe CI/CD, Infrastructure as Code (IaC), blue/green and canary releases, and rollback strategies.