Design overload protection with load shedding

Q: Design overload protection with load shedding

This is a System Design interview question from TikTok for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Design: Maintain p99 Latency SLOs During Sudden Traffic Spikes

Context

You are designing a user-facing, read-heavy HTTP/gRPC service that occasionally experiences sudden traffic spikes (for example, due to push notifications or viral content). The service must maintain a p99 latency SLO (e.g., 200 ms) and degrade gracefully under overload.

Assume a typical architecture: Clients → L7 load balancer/reverse proxy → stateless application instances → critical dependencies (cache, DB, search, feature store). Autoscaling cannot fully mask second-level spikes, so the service must protect itself.

Tasks

Admission Control and Rate Limiting
- Describe how you would implement admission control at the load balancer and within the application.
- Include token-bucket rate limiting (global and per-tenant/key), concurrency limits, and burst handling.
Queueing, Priorities, Deadlines, and Timeouts
- Design request queues with priority classes (e.g., P0 interactive, P1 best-effort) and small, bounded backlogs.
- Explain how you propagate and enforce request deadlines and set timeouts to meet the p99 SLO.
Load Shedding Strategies
- Compare and contrast: drop-new, drop-tail, random (e.g., RED), and deadline-aware shedding.
- Explain when each strategy is preferable, and where to apply it (load balancer vs. application).
Circuit Breakers and Backpressure
- Explain circuit breakers for service-to-service calls (open/half-open/closed, trip conditions, fallbacks).
- Describe how you provide backpressure to clients (HTTP/gRPC) and to internal queues.
Protecting Critical Dependencies
- Show how you isolate and protect caches/DBs/search under overload (bulkheads, quotas, fallbacks, precomputed or cached responses).
Metrics and Alerts
- Specify the metrics, SLI/SLOs, and alerting you would use to validate the effectiveness of your design under spikes.
Include brief numeric examples where helpful (e.g., choosing token-bucket parameters, concurrency caps, and queue budgets) and call out key trade-offs and pitfalls (retry storms, head-of-line blocking, etc.).

Design overload protection with load shedding

Design: Maintain p99 Latency SLOs During Sudden Traffic Spikes

Context

Tasks

Solution

Comments (0)