Design: Maintain p99 Latency SLOs During Sudden Traffic Spikes
Context
You are designing a user-facing, read-heavy HTTP/gRPC service that occasionally experiences sudden traffic spikes (for example, due to push notifications or viral content). The service must maintain a p99 latency SLO (e.g., 200 ms) and degrade gracefully under overload.
Assume a typical architecture: Clients → L7 load balancer/reverse proxy → stateless application instances → critical dependencies (cache, DB, search, feature store). Autoscaling cannot fully mask second-level spikes, so the service must protect itself.
Tasks
-
Admission Control and Rate Limiting
-
Describe how you would implement admission control at the load balancer and within the application.
-
Include token-bucket rate limiting (global and per-tenant/key), concurrency limits, and burst handling.
-
Queueing, Priorities, Deadlines, and Timeouts
-
Design request queues with priority classes (e.g., P0 interactive, P1 best-effort) and small, bounded backlogs.
-
Explain how you propagate and enforce request deadlines and set timeouts to meet the p99 SLO.
-
Load Shedding Strategies
-
Compare and contrast: drop-new, drop-tail, random (e.g., RED), and deadline-aware shedding.
-
Explain when each strategy is preferable, and where to apply it (load balancer vs. application).
-
Circuit Breakers and Backpressure
-
Explain circuit breakers for service-to-service calls (open/half-open/closed, trip conditions, fallbacks).
-
Describe how you provide backpressure to clients (HTTP/gRPC) and to internal queues.
-
Protecting Critical Dependencies
-
Show how you isolate and protect caches/DBs/search under overload (bulkheads, quotas, fallbacks, precomputed or cached responses).
-
Metrics and Alerts
-
Specify the metrics, SLI/SLOs, and alerting you would use to validate the effectiveness of your design under spikes.
-
Include brief numeric examples where helpful (e.g., choosing token-bucket parameters, concurrency caps, and queue budgets) and call out key trade-offs and pitfalls (retry storms, head-of-line blocking, etc.).