System Design: Massive-Scale Event Management and Email Notifications
Context
Design a multi-tenant platform that manages large numbers of concurrent events and reliably sends notification emails to registered participants. The system must support at least 1,000,000 concurrent active events and handle wide fan-out email campaigns per event with strong deliverability, compliance, and observability.
Requirements
-
Core APIs
-
Create/update/cancel event
-
Register participant to event
-
Schedule/trigger emails (time-based reminders and event-driven notifications)
-
Query status (event, registrations, email jobs)
-
Data Model and Storage Strategy
-
Schemas for tenants, events, participants, registrations, email jobs/messages, suppression/unsubscribe, bounces/complaints
-
Indexing, partitioning, and retention policies
-
Email Delivery Pipeline (End-to-End)
-
Producers, queues, consumers, batching
-
Retries with backoff, idempotency keys, deduplication
-
Bounce/complaint handling and suppression updates
-
Capacity Estimates and Scaling
-
QPS assumptions, fan-out per event, peak email throughput
-
Sharding keys and autoscaling triggers
-
Scheduling strategies (time-based vs event-driven), distributed scheduler, backpressure
-
Rate Limiting and Quotas
-
Per-tenant and per-provider limits
-
Adherence to provider quotas (per-second, per-day) and throttling behavior
-
Delivery semantics: at-least-once vs exactly-once at recipient, DLQs, and reprocessing
-
Observability and Reliability
-
Metrics, SLOs, tracing, and alerting
-
Abuse prevention and compliance (unsubscribe, suppression, double opt-in, CAN-SPAM/GDPR)
-
Multi-Region and Disaster Recovery
-
Regional architecture, failover, RPO/RTO
-
Consistency trade-offs
-
Cost Estimate and Build vs Buy
-
Managed services vs self-hosting trade-offs
-
Failure Scenario Walkthrough
-
Example: regional outage or provider throttling and system recovery