System Design: Low-Latency Targeted Ad Serving (≤100 ms E2E)
You are designing an online advertising serving system for web/app inventory with a strict end-to-end latency SLA of 100 ms (client request to ad response). Assume multi-region deployment and peak global load.
Functional Requirements
-
Real-time ad selection and auction within 100 ms end-to-end.
-
Targeting: keywords/context, user segments, and geography.
-
Campaign management: budgets (daily/total), pacing, schedules, frequency capping, creatives.
-
Ranking: combine advertiser bid with predicted CTR/CVR and quality signals.
-
Real-time auctions and pricing.
-
Near-real-time reporting for impressions, clicks, spend, basic conversions.
What to Deliver
-
APIs
-
Define an ad request API (input fields) and ad response API (output fields).
-
Core Data Models
-
Specify entities for campaign/line item, creative, targeting, budget/pacing, user segments, and event/log schemas.
-
Architecture
-
Outline components and data flow: edge gateways, ad selector, feature store, model service, auctioneer, throttling/pacing, caching layers, logging/streaming pipeline, and offline analytics/warehouse.
-
Auction Mechanics and Strategy
-
Choose an auction (e.g., first-price, second-price/GSP, VCG), explain incentives and trade-offs.
-
Additional Topics
-
Handling cold start for new ads and new users.
-
Deduplication (impressions/clicks/events; creative dedupe within response).
-
A/B testing strategy and experiment bucketing.
-
Privacy and compliance (GDPR/CCPA, consent, data retention, DSAR).
-
Non-Functional Requirements
-
Scaling approach, storage and indexing choices, consistency guarantees.
-
Fault tolerance, backpressure strategies.
-
Capacity planning with back-of-the-envelope estimates (traffic, CPU for scoring/auction, storage for logs/features, network bandwidth).
Constraints and Assumptions (you may refine)
-
100 ms total E2E budget (assume ~50–70 ms server-side budget to account for network/client).
-
Peak traffic: choose and justify a realistic peak QPS (e.g., 20k–100k QPS globally); design should scale.
-
Near-real-time reporting: ≤5 minutes latency.
-
Availability target: ≥99.9% per region with graceful degradation.