Design an ad frequency capping system
Company: Netflix
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Onsite
Design a real-time **ad frequency capping** system for an ads platform.
When the ad server is deciding whether to show an ad to a user, it must enforce **caps** that limit how often that user sees a given ad. For example:
- **Per user, per campaign**: at most `K` impressions in the last `T` hours (a **rolling window**, not a calendar window).
- Optionally, the same idea on other dimensions — per user per advertiser, per user per creative, etc. These are independent caps, and a request must satisfy **all** applicable caps to serve.
### Core requirements
- **Multiple time windows** must be supported simultaneously on the same dimension (e.g., ≤2/1h **and** ≤6/24h **and** ≤10/7d).
- **Two APIs** at minimum:
1. **Check-and-reserve** — called at decision time to ask "may I show this ad to this user right now?" and reserve the slot if so.
2. **Impression logging** — called to record an impression that was actually shown.
- **Consistency**: explicitly **define what consistency you need** (strict vs. best-effort) and **justify the tradeoffs**.
### What to cover
Walk through the full design, addressing each of:
- **Data model** — how you represent and answer the rolling-window count cheaply.
- **Storage choices** — what backs the hot serving path and why.
- **Write and read paths** — the flow for both check-and-reserve and impression logging.
- **Deduplication** — handling duplicate/retried events and concurrent decisions.
- **TTL / expiry** — how counts and reservations age out.
- **Sharding strategy** — how you partition for the scale above.
- **Monitoring & alerting** — how you detect both **correctness** issues (over-/under-capping) and **system health** issues.
```hint Framing the hot-path question
This is fundamentally a **rolling-window counter** problem on the serving path. A useful anchor: *"How do I answer 'count in the last `T`' for several different `T` at once, in a single round trip, within a single-digit-ms budget?"* Note that **how consistent the cap must be** — strict everywhere versus some tolerated slack — is itself a decision you have to make and justify, and it shapes almost every downstream choice. Treat it as something to resolve in your answer, not a given.
```
```hint Counting cost
Reason about what each candidate representation costs. A raw per-event record — say a deque of impression timestamps per `(user, entity)` — is exact, but think through its memory cost at this cardinality and whether exactness is actually required. How do production rate limiters answer "how many in the last `T`" *without* keeping every event, and what accuracy do they trade away to do it? Whatever scheme you choose, be explicit about its read cost, write cost, and the size and bound of any approximation error.
```
```hint The reserve→commit race
Check-and-reserve and impression-logging are two phases, so a slot that's been approved but not yet logged must not be re-approvable. Ask: what property must the "read the counts, compare to `K`, record the decision" step have when two auctions for the *same user* run concurrently — and where does your partitioning/key scheme need to place a single user's data for that property to be cheap to achieve? Separately, think about how a retried or duplicated impression is kept from being counted twice.
```
### Clarifying Questions to Ask
A strong candidate scopes the problem before designing. Good questions here:
- **Consistency target**: is a *strict* global "never exceed `K`" required, or is *bounded* over-delivery (occasionally `K+1`/`K+2`) acceptable? Is over-delivery or under-delivery the worse business outcome?
- **Window semantics**: rolling window from "now" vs. calendar window ("today")? How precise must the window boundary be — is minute-scale slop on a 24h cap acceptable?
- **Cap dimensions & windows**: which dimensions are capped (campaign, advertiser, creative, ad group)? How many simultaneous windows per dimension, and are caps independent (all must pass)?
- **Identity**: what user identity does the ad server hand us (logged-in id, device id, hashed cookie)? Is cross-device stitching in scope on the hot path?
- **Source of truth**: is there a durable impression/billing log the cap store can be rebuilt from, or is the cap store itself authoritative?
- **Failure posture**: when the cap store is unavailable, should we fail open (serve) or fail closed (block)? Does it differ for hard/regulated caps vs. soft pacing caps?
### Constraints & Assumptions
Anchor the design with explicit numbers (use these unless the interviewer offers others):
- **Latency**: the cap check is a slice of the auction budget — target **p99 ≤ ~1–2 ms** for the whole check-and-reserve.
- **Throughput**: on the order of **millions of serving decisions/sec** at peak (e.g. ~2M/s), each potentially evaluating several cap dimensions × several windows.
- **Cardinality**: hundreds of millions of active users × thousands of active campaigns. Part of the design is reasoning about how much state actually has to exist at any moment versus the full cross product.
- **Late/out-of-order events**: impression events are emitted on every impression and may arrive **late or out of order** relative to the serving decision.
### What a Strong Answer Covers
The interviewer is looking for these *dimensions* (not the answers themselves):
- **Back-of-the-envelope sizing** that actually drives the storage choice (QPS, cardinality, per-key payload, latency budget).
- A **rolling-window data model** with an explicit read-cost / write-cost / exactness tradeoff, and awareness that a naive per-event log doesn't scale.
- **API design** for check-and-reserve, impression-commit, and reservation lifecycle (expiry/cancel), with idempotency.
- A clear **consistency stance** that is *stated and justified* against the latency/QPS budget — strict vs. best-effort, and where the relaxation lives.
- **Atomicity** of the reserve→commit transition (no double-count, no double-reserve) and how that maps onto the storage primitive.
- **Sharding/partitioning** that keeps a single user's decision on a single shard, plus hot-key and resharding handling.
- **Failure modes** (store outage, shard loss, cross-region split, lost commit, clock skew, bad spec) with concrete mitigations and a fail-open/fail-closed policy.
- **Monitoring** that separates **correctness** SLOs (over-/under-capping measured against a source of truth) from **system-health** SLOs (latency, QPS, hot shards).
- Handling of **late / out-of-order** impressions without corrupting past decisions, and **multi-region** behavior.
### Follow-up Questions
Be ready for the interviewer to push deeper:
- **Multi-region**: a user is served simultaneously in two regions — how do caps stay correct, and what's the bound on over-delivery? Why is last-write-wins the wrong merge for a counter?
- **Scale jump**: at 10× QPS / 10× cardinality, what breaks first, and how do you find and relieve the hot shard?
- **Exactness**: a particular cap must be exact (regulated category, no over-delivery) — how does your design change for *that* cap without slowing down the rest?
- **Recovery**: a shard (or the whole hot store) is lost or flushed — how do you rebuild counts, and what's the transient correctness impact while you do?
Quick Answer: This question evaluates competency in distributed systems architecture, real-time stateful services, high-cardinality time-series data modeling, API design for low-latency decision paths, and operational monitoring including consistency and latency trade-offs.