PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Netflix

Design an ad frequency capping system

Last updated: Apr 7, 2026

Quick Overview

This question evaluates competency in distributed systems architecture, real-time stateful services, high-cardinality time-series data modeling, API design for low-latency decision paths, and operational monitoring including consistency and latency trade-offs.

  • hard
  • Netflix
  • System Design
  • Software Engineer

Design an ad frequency capping system

Company: Netflix

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Onsite

Design a real-time **ad frequency capping** system for an ads platform. When the ad server is deciding whether to show an ad to a user, it must enforce **caps** that limit how often that user sees a given ad. For example: - **Per user, per campaign**: at most `K` impressions in the last `T` hours (a **rolling window**, not a calendar window). - Optionally, the same idea on other dimensions — per user per advertiser, per user per creative, etc. These are independent caps, and a request must satisfy **all** applicable caps to serve. ### Core requirements - **Multiple time windows** must be supported simultaneously on the same dimension (e.g., ≤2/1h **and** ≤6/24h **and** ≤10/7d). - **Two APIs** at minimum: 1. **Check-and-reserve** — called at decision time to ask "may I show this ad to this user right now?" and reserve the slot if so. 2. **Impression logging** — called to record an impression that was actually shown. - **Consistency**: explicitly **define what consistency you need** (strict vs. best-effort) and **justify the tradeoffs**. ### What to cover Walk through the full design, addressing each of: - **Data model** — how you represent and answer the rolling-window count cheaply. - **Storage choices** — what backs the hot serving path and why. - **Write and read paths** — the flow for both check-and-reserve and impression logging. - **Deduplication** — handling duplicate/retried events and concurrent decisions. - **TTL / expiry** — how counts and reservations age out. - **Sharding strategy** — how you partition for the scale above. - **Monitoring & alerting** — how you detect both **correctness** issues (over-/under-capping) and **system health** issues. ```hint Framing the hot-path question This is fundamentally a **rolling-window counter** problem on the serving path. A useful anchor: *"How do I answer 'count in the last `T`' for several different `T` at once, in a single round trip, within a single-digit-ms budget?"* Note that **how consistent the cap must be** — strict everywhere versus some tolerated slack — is itself a decision you have to make and justify, and it shapes almost every downstream choice. Treat it as something to resolve in your answer, not a given. ``` ```hint Counting cost Reason about what each candidate representation costs. A raw per-event record — say a deque of impression timestamps per `(user, entity)` — is exact, but think through its memory cost at this cardinality and whether exactness is actually required. How do production rate limiters answer "how many in the last `T`" *without* keeping every event, and what accuracy do they trade away to do it? Whatever scheme you choose, be explicit about its read cost, write cost, and the size and bound of any approximation error. ``` ```hint The reserve→commit race Check-and-reserve and impression-logging are two phases, so a slot that's been approved but not yet logged must not be re-approvable. Ask: what property must the "read the counts, compare to `K`, record the decision" step have when two auctions for the *same user* run concurrently — and where does your partitioning/key scheme need to place a single user's data for that property to be cheap to achieve? Separately, think about how a retried or duplicated impression is kept from being counted twice. ``` ### Clarifying Questions to Ask A strong candidate scopes the problem before designing. Good questions here: - **Consistency target**: is a *strict* global "never exceed `K`" required, or is *bounded* over-delivery (occasionally `K+1`/`K+2`) acceptable? Is over-delivery or under-delivery the worse business outcome? - **Window semantics**: rolling window from "now" vs. calendar window ("today")? How precise must the window boundary be — is minute-scale slop on a 24h cap acceptable? - **Cap dimensions & windows**: which dimensions are capped (campaign, advertiser, creative, ad group)? How many simultaneous windows per dimension, and are caps independent (all must pass)? - **Identity**: what user identity does the ad server hand us (logged-in id, device id, hashed cookie)? Is cross-device stitching in scope on the hot path? - **Source of truth**: is there a durable impression/billing log the cap store can be rebuilt from, or is the cap store itself authoritative? - **Failure posture**: when the cap store is unavailable, should we fail open (serve) or fail closed (block)? Does it differ for hard/regulated caps vs. soft pacing caps? ### Constraints & Assumptions Anchor the design with explicit numbers (use these unless the interviewer offers others): - **Latency**: the cap check is a slice of the auction budget — target **p99 ≤ ~1–2 ms** for the whole check-and-reserve. - **Throughput**: on the order of **millions of serving decisions/sec** at peak (e.g. ~2M/s), each potentially evaluating several cap dimensions × several windows. - **Cardinality**: hundreds of millions of active users × thousands of active campaigns. Part of the design is reasoning about how much state actually has to exist at any moment versus the full cross product. - **Late/out-of-order events**: impression events are emitted on every impression and may arrive **late or out of order** relative to the serving decision. ### What a Strong Answer Covers The interviewer is looking for these *dimensions* (not the answers themselves): - **Back-of-the-envelope sizing** that actually drives the storage choice (QPS, cardinality, per-key payload, latency budget). - A **rolling-window data model** with an explicit read-cost / write-cost / exactness tradeoff, and awareness that a naive per-event log doesn't scale. - **API design** for check-and-reserve, impression-commit, and reservation lifecycle (expiry/cancel), with idempotency. - A clear **consistency stance** that is *stated and justified* against the latency/QPS budget — strict vs. best-effort, and where the relaxation lives. - **Atomicity** of the reserve→commit transition (no double-count, no double-reserve) and how that maps onto the storage primitive. - **Sharding/partitioning** that keeps a single user's decision on a single shard, plus hot-key and resharding handling. - **Failure modes** (store outage, shard loss, cross-region split, lost commit, clock skew, bad spec) with concrete mitigations and a fail-open/fail-closed policy. - **Monitoring** that separates **correctness** SLOs (over-/under-capping measured against a source of truth) from **system-health** SLOs (latency, QPS, hot shards). - Handling of **late / out-of-order** impressions without corrupting past decisions, and **multi-region** behavior. ### Follow-up Questions Be ready for the interviewer to push deeper: - **Multi-region**: a user is served simultaneously in two regions — how do caps stay correct, and what's the bound on over-delivery? Why is last-write-wins the wrong merge for a counter? - **Scale jump**: at 10× QPS / 10× cardinality, what breaks first, and how do you find and relieve the hot shard? - **Exactness**: a particular cap must be exact (regulated category, no over-delivery) — how does your design change for *that* cap without slowing down the rest? - **Recovery**: a shard (or the whole hot store) is lost or flushed — how do you rebuild counts, and what's the transient correctness impact while you do?

Quick Answer: This question evaluates competency in distributed systems architecture, real-time stateful services, high-cardinality time-series data modeling, API design for low-latency decision paths, and operational monitoring including consistency and latency trade-offs.

Related Interview Questions

  • Design Ad Frequency and Order Tracking - Netflix
  • Design Rolling-Window Ad Frequency Capping - Netflix (medium)
  • Design ad frequency capping - Netflix (medium)
  • Design a File Backup System - Netflix (hard)
  • Design an Ad Pacing System - Netflix (hard)
Netflix logo
Netflix
Mar 9, 2026, 12:00 AM
Software Engineer
Onsite
System Design
183
0

Design a real-time ad frequency capping system for an ads platform.

When the ad server is deciding whether to show an ad to a user, it must enforce caps that limit how often that user sees a given ad. For example:

  • Per user, per campaign : at most K impressions in the last T hours (a rolling window , not a calendar window).
  • Optionally, the same idea on other dimensions — per user per advertiser, per user per creative, etc. These are independent caps, and a request must satisfy all applicable caps to serve.

Core requirements

  • Multiple time windows must be supported simultaneously on the same dimension (e.g., ≤2/1h and ≤6/24h and ≤10/7d).
  • Two APIs at minimum:
    1. Check-and-reserve — called at decision time to ask "may I show this ad to this user right now?" and reserve the slot if so.
    2. Impression logging — called to record an impression that was actually shown.
  • Consistency : explicitly define what consistency you need (strict vs. best-effort) and justify the tradeoffs .

What to cover

Walk through the full design, addressing each of:

  • Data model — how you represent and answer the rolling-window count cheaply.
  • Storage choices — what backs the hot serving path and why.
  • Write and read paths — the flow for both check-and-reserve and impression logging.
  • Deduplication — handling duplicate/retried events and concurrent decisions.
  • TTL / expiry — how counts and reservations age out.
  • Sharding strategy — how you partition for the scale above.
  • Monitoring & alerting — how you detect both correctness issues (over-/under-capping) and system health issues.

Clarifying Questions to Ask

A strong candidate scopes the problem before designing. Good questions here:

  • Consistency target : is a strict global "never exceed K " required, or is bounded over-delivery (occasionally K+1 / K+2 ) acceptable? Is over-delivery or under-delivery the worse business outcome?
  • Window semantics : rolling window from "now" vs. calendar window ("today")? How precise must the window boundary be — is minute-scale slop on a 24h cap acceptable?
  • Cap dimensions & windows : which dimensions are capped (campaign, advertiser, creative, ad group)? How many simultaneous windows per dimension, and are caps independent (all must pass)?
  • Identity : what user identity does the ad server hand us (logged-in id, device id, hashed cookie)? Is cross-device stitching in scope on the hot path?
  • Source of truth : is there a durable impression/billing log the cap store can be rebuilt from, or is the cap store itself authoritative?
  • Failure posture : when the cap store is unavailable, should we fail open (serve) or fail closed (block)? Does it differ for hard/regulated caps vs. soft pacing caps?

Constraints & Assumptions

Anchor the design with explicit numbers (use these unless the interviewer offers others):

  • Latency : the cap check is a slice of the auction budget — target p99 ≤ ~1–2 ms for the whole check-and-reserve.
  • Throughput : on the order of millions of serving decisions/sec at peak (e.g. ~2M/s), each potentially evaluating several cap dimensions × several windows.
  • Cardinality : hundreds of millions of active users × thousands of active campaigns. Part of the design is reasoning about how much state actually has to exist at any moment versus the full cross product.
  • Late/out-of-order events : impression events are emitted on every impression and may arrive late or out of order relative to the serving decision.

What a Strong Answer Covers

The interviewer is looking for these dimensions (not the answers themselves):

  • Back-of-the-envelope sizing that actually drives the storage choice (QPS, cardinality, per-key payload, latency budget).
  • A rolling-window data model with an explicit read-cost / write-cost / exactness tradeoff, and awareness that a naive per-event log doesn't scale.
  • API design for check-and-reserve, impression-commit, and reservation lifecycle (expiry/cancel), with idempotency.
  • A clear consistency stance that is stated and justified against the latency/QPS budget — strict vs. best-effort, and where the relaxation lives.
  • Atomicity of the reserve→commit transition (no double-count, no double-reserve) and how that maps onto the storage primitive.
  • Sharding/partitioning that keeps a single user's decision on a single shard, plus hot-key and resharding handling.
  • Failure modes (store outage, shard loss, cross-region split, lost commit, clock skew, bad spec) with concrete mitigations and a fail-open/fail-closed policy.
  • Monitoring that separates correctness SLOs (over-/under-capping measured against a source of truth) from system-health SLOs (latency, QPS, hot shards).
  • Handling of late / out-of-order impressions without corrupting past decisions, and multi-region behavior.

Follow-up Questions

Be ready for the interviewer to push deeper:

  • Multi-region : a user is served simultaneously in two regions — how do caps stay correct, and what's the bound on over-delivery? Why is last-write-wins the wrong merge for a counter?
  • Scale jump : at 10× QPS / 10× cardinality, what breaks first, and how do you find and relieve the hot shard?
  • Exactness : a particular cap must be exact (regulated category, no over-delivery) — how does your design change for that cap without slowing down the rest?
  • Recovery : a shard (or the whole hot store) is lost or flushed — how do you rebuild counts, and what's the transient correctness impact while you do?

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Netflix•More Software Engineer•Netflix Software Engineer•Netflix System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.