How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Onsite rounds at Netflix.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Netflix during technical interviews.

Design an ad frequency capping system | Netflix Interview Question

Q: Design an ad frequency capping system

This question evaluates competency in distributed systems architecture, real-time stateful services, high-cardinality time-series data modeling, API design for low-latency decision paths, and operational monitoring including consistency and latency trade-offs.

Design a real-time ad frequency capping system for an ads platform.

When the ad server is deciding whether to show an ad to a user, it must enforce caps that limit how often that user sees a given ad. For example:

Per user, per campaign : at most K impressions in the last T hours (a rolling window , not a calendar window).
Optionally, the same idea on other dimensions — per user per advertiser, per user per creative, etc. These are independent caps, and a request must satisfy all applicable caps to serve.

Core requirements

Multiple time windows must be supported simultaneously on the same dimension (e.g., ≤2/1h and ≤6/24h and ≤10/7d).
Two APIs at minimum:
1. Check-and-reserve — called at decision time to ask "may I show this ad to this user right now?" and reserve the slot if so.
2. Impression logging — called to record an impression that was actually shown.
Consistency : explicitly define what consistency you need (strict vs. best-effort) and justify the tradeoffs .

What to cover

Walk through the full design, addressing each of:

Data model — how you represent and answer the rolling-window count cheaply.
Storage choices — what backs the hot serving path and why.
Write and read paths — the flow for both check-and-reserve and impression logging.
Deduplication — handling duplicate/retried events and concurrent decisions.
TTL / expiry — how counts and reservations age out.
Sharding strategy — how you partition for the scale above.
Monitoring & alerting — how you detect both correctness issues (over-/under-capping) and system health issues.

Clarifying Questions to Ask

A strong candidate scopes the problem before designing. Good questions here:

Consistency target : is a strict global "never exceed K " required, or is bounded over-delivery (occasionally K+1 / K+2 ) acceptable? Is over-delivery or under-delivery the worse business outcome?
Window semantics : rolling window from "now" vs. calendar window ("today")? How precise must the window boundary be — is minute-scale slop on a 24h cap acceptable?
Cap dimensions & windows : which dimensions are capped (campaign, advertiser, creative, ad group)? How many simultaneous windows per dimension, and are caps independent (all must pass)?
Identity : what user identity does the ad server hand us (logged-in id, device id, hashed cookie)? Is cross-device stitching in scope on the hot path?
Source of truth : is there a durable impression/billing log the cap store can be rebuilt from, or is the cap store itself authoritative?
Failure posture : when the cap store is unavailable, should we fail open (serve) or fail closed (block)? Does it differ for hard/regulated caps vs. soft pacing caps?

Constraints & Assumptions

Anchor the design with explicit numbers (use these unless the interviewer offers others):

Latency : the cap check is a slice of the auction budget — target p99 ≤ ~1–2 ms for the whole check-and-reserve.
Throughput : on the order of millions of serving decisions/sec at peak (e.g. ~2M/s), each potentially evaluating several cap dimensions × several windows.
Cardinality : hundreds of millions of active users × thousands of active campaigns. Part of the design is reasoning about how much state actually has to exist at any moment versus the full cross product.
Late/out-of-order events : impression events are emitted on every impression and may arrive late or out of order relative to the serving decision.

What a Strong Answer Covers

The interviewer is looking for these dimensions (not the answers themselves):

Back-of-the-envelope sizing that actually drives the storage choice (QPS, cardinality, per-key payload, latency budget).
A rolling-window data model with an explicit read-cost / write-cost / exactness tradeoff, and awareness that a naive per-event log doesn't scale.
API design for check-and-reserve, impression-commit, and reservation lifecycle (expiry/cancel), with idempotency.
A clear consistency stance that is stated and justified against the latency/QPS budget — strict vs. best-effort, and where the relaxation lives.
Atomicity of the reserve→commit transition (no double-count, no double-reserve) and how that maps onto the storage primitive.
Sharding/partitioning that keeps a single user's decision on a single shard, plus hot-key and resharding handling.
Failure modes (store outage, shard loss, cross-region split, lost commit, clock skew, bad spec) with concrete mitigations and a fail-open/fail-closed policy.
Monitoring that separates correctness SLOs (over-/under-capping measured against a source of truth) from system-health SLOs (latency, QPS, hot shards).
Handling of late / out-of-order impressions without corrupting past decisions, and multi-region behavior.

Follow-up Questions

Be ready for the interviewer to push deeper:

Multi-region : a user is served simultaneously in two regions — how do caps stay correct, and what's the bound on over-delivery? Why is last-write-wins the wrong merge for a counter?
Scale jump : at 10× QPS / 10× cardinality, what breaks first, and how do you find and relieve the hot shard?
Exactness : a particular cap must be exact (regulated category, no over-delivery) — how does your design change for that cap without slowing down the rest?
Recovery : a shard (or the whole hot store) is lost or flushed — how do you rebuild counts, and what's the transient correctness impact while you do?

Design a real-time ad frequency capping system for an ads platform.

When the ad server is deciding whether to show an ad to a user, it must enforce caps that limit how often that user sees a given ad. For example:

Per user, per campaign : at most K impressions in the last T hours (a rolling window , not a calendar window).
Optionally, the same idea on other dimensions — per user per advertiser, per user per creative, etc. These are independent caps, and a request must satisfy all applicable caps to serve.

Core requirements

Multiple time windows must be supported simultaneously on the same dimension (e.g., ≤2/1h and ≤6/24h and ≤10/7d).
Two APIs at minimum:
1. Check-and-reserve — called at decision time to ask "may I show this ad to this user right now?" and reserve the slot if so.
2. Impression logging — called to record an impression that was actually shown.
Consistency : explicitly define what consistency you need (strict vs. best-effort) and justify the tradeoffs .

What to cover

Walk through the full design, addressing each of:

Data model — how you represent and answer the rolling-window count cheaply.
Storage choices — what backs the hot serving path and why.
Write and read paths — the flow for both check-and-reserve and impression logging.
Deduplication — handling duplicate/retried events and concurrent decisions.
TTL / expiry — how counts and reservations age out.
Sharding strategy — how you partition for the scale above.
Monitoring & alerting — how you detect both correctness issues (over-/under-capping) and system health issues.

Clarifying Questions to Ask

A strong candidate scopes the problem before designing. Good questions here:

Consistency target : is a strict global "never exceed K " required, or is bounded over-delivery (occasionally K+1 / K+2 ) acceptable? Is over-delivery or under-delivery the worse business outcome?
Window semantics : rolling window from "now" vs. calendar window ("today")? How precise must the window boundary be — is minute-scale slop on a 24h cap acceptable?
Cap dimensions & windows : which dimensions are capped (campaign, advertiser, creative, ad group)? How many simultaneous windows per dimension, and are caps independent (all must pass)?
Identity : what user identity does the ad server hand us (logged-in id, device id, hashed cookie)? Is cross-device stitching in scope on the hot path?
Source of truth : is there a durable impression/billing log the cap store can be rebuilt from, or is the cap store itself authoritative?
Failure posture : when the cap store is unavailable, should we fail open (serve) or fail closed (block)? Does it differ for hard/regulated caps vs. soft pacing caps?

Constraints & Assumptions

Anchor the design with explicit numbers (use these unless the interviewer offers others):

Latency : the cap check is a slice of the auction budget — target p99 ≤ ~1–2 ms for the whole check-and-reserve.
Throughput : on the order of millions of serving decisions/sec at peak (e.g. ~2M/s), each potentially evaluating several cap dimensions × several windows.
Cardinality : hundreds of millions of active users × thousands of active campaigns. Part of the design is reasoning about how much state actually has to exist at any moment versus the full cross product.
Late/out-of-order events : impression events are emitted on every impression and may arrive late or out of order relative to the serving decision.

What a Strong Answer Covers

The interviewer is looking for these dimensions (not the answers themselves):

Back-of-the-envelope sizing that actually drives the storage choice (QPS, cardinality, per-key payload, latency budget).
A rolling-window data model with an explicit read-cost / write-cost / exactness tradeoff, and awareness that a naive per-event log doesn't scale.
API design for check-and-reserve, impression-commit, and reservation lifecycle (expiry/cancel), with idempotency.
A clear consistency stance that is stated and justified against the latency/QPS budget — strict vs. best-effort, and where the relaxation lives.
Atomicity of the reserve→commit transition (no double-count, no double-reserve) and how that maps onto the storage primitive.
Sharding/partitioning that keeps a single user's decision on a single shard, plus hot-key and resharding handling.
Failure modes (store outage, shard loss, cross-region split, lost commit, clock skew, bad spec) with concrete mitigations and a fail-open/fail-closed policy.
Monitoring that separates correctness SLOs (over-/under-capping measured against a source of truth) from system-health SLOs (latency, QPS, hot shards).
Handling of late / out-of-order impressions without corrupting past decisions, and multi-region behavior.

Follow-up Questions

Be ready for the interviewer to push deeper:

Multi-region : a user is served simultaneously in two regions — how do caps stay correct, and what's the bound on over-delivery? Why is last-write-wins the wrong merge for a counter?
Scale jump : at 10× QPS / 10× cardinality, what breaks first, and how do you find and relieve the hot shard?
Exactness : a particular cap must be exact (regulated category, no over-delivery) — how does your design change for that cap without slowing down the rest?
Recovery : a shard (or the whole hot store) is lost or flushed — how do you rebuild counts, and what's the transient correctness impact while you do?

Design an ad frequency capping system

Quick Overview

Core requirements

What to cover

Clarifying Questions to Ask

Constraints & Assumptions

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Design an ad frequency capping system

Quick Overview

Core requirements

What to cover

Clarifying Questions to Ask

Constraints & Assumptions

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP