How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

What difficulty level is this interview question?

This is a medium difficulty Analytics & Experimentation question, commonly asked during Onsite rounds at Meta.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Meta during technical interviews.

Estimate ads ranking revenue impact | Meta Interview Question

Estimate ads ranking revenue impact

Company: Meta

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: medium

Interview Round: Onsite

You are the data scientist for an ads ranking team at a large social platform. The team has built a new ranking algorithm for feed ads. The new model changes the ordering of ads by combining **bid**, **predicted click-through rate (pCTR)**, **predicted conversion rate (pCVR)**, and **ad quality** differently from the current production ranker. A short ramp suggests that revenue per daily active user (DAU) increased, but the team is worried that the short-term lift may not represent the medium-term impact: users may adapt to the new ad mix, advertisers may change bids or budgets, and auction dynamics may shift. Your task is to design an approach to estimate the **medium-term revenue impact** of launching the new ads ranking algorithm over a **4- to 8-week horizon**, and to make a launch recommendation. ### Constraints & Assumptions - Platform scale: a large eligible DAU base (a specific headcount is supplied in Part 5 for the scale-up); revenue is driven by an ad auction in which advertisers set daily/lifetime budgets that are paced over time. - The intervention is a **feed ranking change**, experienced at the user level, but it interacts with a **shared marketplace** (advertiser budgets are pooled across users). - A short (e.g. 2-day) ramp already showed a positive revenue-per-DAU signal; the open question is whether that lift persists, decays, or reverses over weeks. - You have access to standard experimentation infrastructure (A/B testing, geo holdouts), pre-period covariates, and advertiser-side delivery/pacing data. - "Medium-term" specifically means capturing user adaptation, advertiser bid/budget response, and at least one full advertiser budget cycle. ### Clarifying Questions to Ask - Is the goal to estimate revenue at **full rollout** (the launch decision) or just the effect within the experiment population? This determines whether spillover is bias or part of the estimand. - What is the relevant **advertiser budget cycle length**, and roughly what fraction of revenue comes from budget-constrained advertisers? - What guardrail thresholds (retention, negative feedback, advertiser ROAS) are considered launch-blocking versus monitor-only? - Do we have a usable **geo/market holdout** and pre-period covariates for variance reduction (e.g. CUPED)? - Is there an existing **revenue or LTV model** the business uses to value medium-term user/advertiser effects? - What statistical power / minimum detectable effect (MDE) is achievable at each candidate randomization unit? ### Part 1 — Causal estimand Define the precise causal quantity you want to estimate, including the population, the comparison, the time horizon, and the outcome(s) it should encompass. ```hint What "estimand" means here Be explicit about the four pieces of any estimand: population (eligible users / markets), treatment vs. counterfactual (new ranker vs. keep the current ranker, evaluated at full rollout), horizon (4-8 weeks), and outcome. Ask whether a per-impression metric or a per-user / cumulative metric better matches the business question. ``` ```hint What to encompass The "revenue impact" of a ranking change is not just immediate auction revenue. Consider whether the estimand should also internalize medium-term marketplace effects — advertiser budget reallocation, future ad inventory from retained users, and advertiser value (conversions / ROAS) — because these feed back into revenue. ``` #### What a Strong Answer Covers - A precise estimand naming the population, the counterfactual (full rollout vs. status quo), the 4-8 week horizon, and the outcome. - Recognition that the launch decision implies a **full-equilibrium** estimand, not the partial-equilibrium effect on a small treated slice. - An outcome broader than per-impression revenue, with awareness of why per-impression revenue is gameable. ### Part 2 — Experiment or quasi-experiment design Specify the design you would run, including the **randomization unit**, why you chose it, and how long it must run. Explain the tradeoff between user-level and market-level (geo) randomization for a shared-auction product. ```hint Randomization unit tradeoff Start from the no-interference (SUTVA) assumption and ask whether it holds when treated and control users draw from the same advertiser budgets. Weigh the statistical power of finer units against the contamination they allow. ``` ```hint Duration The horizon must cover the feedback loops you are worried about: weekly seasonality, advertiser budget/pacing cycles, bid re-optimization, and user adaptation. A 2-day ramp captures none of these. ``` #### What a Strong Answer Covers - A randomization choice **justified by the interference tradeoff**, not a generic "run an A/B test." - A clear comparison of user-level (power, clean UX read, but biased by spillover) vs. geo/market-level (internalizes the marketplace, but low power). - A duration that spans at least one full advertiser budget cycle and emphasizes later, post-transient weeks. ### Part 3 — Primary, secondary, and guardrail metrics Propose a metric tree: one (or co-) primary metric, secondary/diagnostic metrics that explain the *mechanism*, and guardrail metrics on the user and advertiser side. ```hint Picking the primary metric Prefer a metric robust to impression-mix shifts. Revenue per impression can rise while total revenue per user falls (fewer / worse impressions). A per-eligible-user or cumulative-over-window revenue metric is harder to game. ``` ```hint Diagnostics vs. guardrails Secondary metrics should let you *explain* a revenue move (eCPM, ad load, fill rate, win price, pCTR/pCVR calibration, budget exhaustion time). Guardrails protect against regressions you would not trade revenue for (engagement, retention, negative feedback, advertiser ROAS/CPA, latency). ``` #### What a Strong Answer Covers - A primary metric robust to impression-mix gaming (per-eligible-user / cumulative revenue, not per-impression). - Diagnostic metrics chosen to **explain mechanism** (price vs. volume, calibration, pacing), not just describe. - Guardrails on **both** the user side (engagement, retention, negative feedback) and the advertiser side (ROAS, CPA, churn), plus system health. ### Part 4 — Auction interference, budgets, seasonality, and heterogeneity Explain how you would handle each of the four threats to validity below, and how each could bias a naive user-level read: - **Auction interference / spillover** (shared advertiser budgets across treatment and control). - **Advertiser budget constraints and pull-forward** (faster spend now ≠ more spend over the window). - **Seasonality** (day-of-week, holidays, campaign cycles). - **User-level heterogeneity** (effect varies by market, tenure, engagement, ad load). ```hint Interference Ask which advertisers actually carry the spillover, and whether a design that internalizes the marketplace (e.g. a geo holdout) could check the user-level number. Think about the *direction* of the bias. ``` ```hint Pull-forward and seasonality Watch the *trajectory* over the window, not a single day, and read what the budget-pacing diagnostics are telling you. Concurrent randomization and pre-period covariates also help. ``` ```hint Heterogeneity Decide which segments matter *before* you look, so a varying effect is a pre-registered finding rather than a fishing expedition. ``` #### What a Strong Answer Covers - For each threat: the **mechanism**, the **direction of bias** on a naive user-level read, and a concrete mitigation. - Spillover treated as the headline threat, with constrained vs. unconstrained advertiser slices and a geo check. - Pull-forward distinguished from real lift via **cumulative** revenue and pacing/exhaustion diagnostics; a stated variance-reduction technique for seasonality; pre-registered segments for heterogeneity. ### Part 5 — Translating to company-level revenue impact Given a per-user effect estimate with a confidence interval, show how you would scale it to a company-level revenue figure over the window, and what corrections/caveats you would attach. For the scale-up, take the eligible population as **200M DAU** and a **28-day window**. ```hint Scaling and caveats Scale the per-user-per-day lift by eligible DAU and the number of days, and propagate the confidence interval through the same multiplication (round only at the end). Then discount or flag the point estimate for known biases — budget pull-forward, advertiser ROI harm, UX-driven inventory loss, and any measured experiment spillover. ``` #### What a Strong Answer Covers - Correct scale-up arithmetic: per-user-per-day lift × eligible DAU × days. - A confidence interval propagated through the same multiplier (not just a point estimate). - Honest caveats that move the headline number: pull-forward, ROI harm, inventory loss, and spillover (geo vs. user-level reconciliation). ### Part 6 — Launch recommendation under a UX-guardrail conflict State the decision logic you would use when short-term revenue is positive but some user-experience guardrails worsen. Make the tradeoff explicit rather than defaulting to "ship" or "kill". ```hint Make the tradeoff quantitative Distinguish a cosmetic guardrail move (e.g. a tiny rise in ad hides) from a load-bearing one (a measurable 28-day retention loss). Frame the decision in terms of long-term user and advertiser lifetime value, and consider ramp / segment-targeting as middle options, not just full launch vs. no launch. ``` #### What a Strong Answer Covers - An explicit decision rule with three outcomes (launch / ramp-or-target / do-not-launch), not a binary. - The conflict resolved against **long-term value** (user and advertiser LTV), not this window's dollars. - A worked contrast between an acceptable guardrail move and a launch-blocking one. ### What a Strong Answer Covers These dimensions span all parts and should be visible across the answer as a whole: - **Keeping interference central.** The candidate never loses sight of the fact that a per-user intervention priced in a shared, budget-constrained auction violates SUTVA, and lets that drive the design, the metrics, and the company-level correction. - **Cumulative-over-window thinking.** Revenue is read as a trajectory across a full budget cycle, never as a single-day or per-impression snapshot. - **Quantitative honesty.** Estimates carry intervals, caveats are subtractive (they change the number), and the final recommendation is grounded in long-term user and advertiser lifetime value rather than the immediate lift. ### Follow-up Questions - Suppose the user-level A/B shows +2.4% revenue but a concurrent geo holdout shows roughly 0%. How do you reconcile them, and which do you trust for the launch decision? - The week-1 lift is +5% but week-4 is +1% and still positive. How do you tell budget pull-forward apart from genuine user adaptation, and how does it change your company-level estimate? - Advertiser ROAS drops slightly in treatment. Why might that erode the revenue lift over a horizon longer than 8 weeks, and how would you monitor it post-launch? - How would you design a long-term holdout to keep measuring the launched change after the experiment ends?

Quick Answer: This question evaluates competency in causal inference, experimentation design, metric construction, and marketplace economics for measuring ad ranking revenue within the Analytics & Experimentation domain.

A short ramp suggests that revenue per daily active user (DAU) increased, but the team is worried that the short-term lift may not represent the medium-term impact: users may adapt to the new ad mix, advertisers may change bids or budgets, and auction dynamics may shift.

Your task is to design an approach to estimate the medium-term revenue impact of launching the new ads ranking algorithm over a 4- to 8-week horizon, and to make a launch recommendation.

Constraints & Assumptions

Platform scale: a large eligible DAU base (a specific headcount is supplied in Part 5 for the scale-up); revenue is driven by an ad auction in which advertisers set daily/lifetime budgets that are paced over time.
The intervention is a feed ranking change , experienced at the user level, but it interacts with a shared marketplace (advertiser budgets are pooled across users).
A short (e.g. 2-day) ramp already showed a positive revenue-per-DAU signal; the open question is whether that lift persists, decays, or reverses over weeks.
You have access to standard experimentation infrastructure (A/B testing, geo holdouts), pre-period covariates, and advertiser-side delivery/pacing data.
"Medium-term" specifically means capturing user adaptation, advertiser bid/budget response, and at least one full advertiser budget cycle.

Clarifying Questions to Ask

Is the goal to estimate revenue at full rollout (the launch decision) or just the effect within the experiment population? This determines whether spillover is bias or part of the estimand.
What is the relevant advertiser budget cycle length , and roughly what fraction of revenue comes from budget-constrained advertisers?
What guardrail thresholds (retention, negative feedback, advertiser ROAS) are considered launch-blocking versus monitor-only?
Do we have a usable geo/market holdout and pre-period covariates for variance reduction (e.g. CUPED)?
Is there an existing revenue or LTV model the business uses to value medium-term user/advertiser effects?
What statistical power / minimum detectable effect (MDE) is achievable at each candidate randomization unit?

Part 1 — Causal estimand

Define the precise causal quantity you want to estimate, including the population, the comparison, the time horizon, and the outcome(s) it should encompass.

What a Strong Answer Covers

A precise estimand naming the population, the counterfactual (full rollout vs. status quo), the 4-8 week horizon, and the outcome.
Recognition that the launch decision implies a full-equilibrium estimand, not the partial-equilibrium effect on a small treated slice.
An outcome broader than per-impression revenue, with awareness of why per-impression revenue is gameable.

Part 2 — Experiment or quasi-experiment design

Specify the design you would run, including the randomization unit, why you chose it, and how long it must run. Explain the tradeoff between user-level and market-level (geo) randomization for a shared-auction product.

What a Strong Answer Covers

A randomization choice justified by the interference tradeoff , not a generic "run an A/B test."
A clear comparison of user-level (power, clean UX read, but biased by spillover) vs. geo/market-level (internalizes the marketplace, but low power).
A duration that spans at least one full advertiser budget cycle and emphasizes later, post-transient weeks.

Part 3 — Primary, secondary, and guardrail metrics

Propose a metric tree: one (or co-) primary metric, secondary/diagnostic metrics that explain the mechanism, and guardrail metrics on the user and advertiser side.

What a Strong Answer Covers

A primary metric robust to impression-mix gaming (per-eligible-user / cumulative revenue, not per-impression).
Diagnostic metrics chosen to explain mechanism (price vs. volume, calibration, pacing), not just describe.
Guardrails on both the user side (engagement, retention, negative feedback) and the advertiser side (ROAS, CPA, churn), plus system health.

Part 4 — Auction interference, budgets, seasonality, and heterogeneity

Explain how you would handle each of the four threats to validity below, and how each could bias a naive user-level read:

Auction interference / spillover (shared advertiser budgets across treatment and control).
Advertiser budget constraints and pull-forward (faster spend now ≠ more spend over the window).
Seasonality (day-of-week, holidays, campaign cycles).
User-level heterogeneity (effect varies by market, tenure, engagement, ad load).

What a Strong Answer Covers

For each threat: the mechanism , the direction of bias on a naive user-level read, and a concrete mitigation.
Spillover treated as the headline threat, with constrained vs. unconstrained advertiser slices and a geo check.
Pull-forward distinguished from real lift via cumulative revenue and pacing/exhaustion diagnostics; a stated variance-reduction technique for seasonality; pre-registered segments for heterogeneity.

Part 5 — Translating to company-level revenue impact

Given a per-user effect estimate with a confidence interval, show how you would scale it to a company-level revenue figure over the window, and what corrections/caveats you would attach. For the scale-up, take the eligible population as 200M DAU and a 28-day window.

What a Strong Answer Covers

Correct scale-up arithmetic: per-user-per-day lift × eligible DAU × days.
A confidence interval propagated through the same multiplier (not just a point estimate).
Honest caveats that move the headline number: pull-forward, ROI harm, inventory loss, and spillover (geo vs. user-level reconciliation).

Part 6 — Launch recommendation under a UX-guardrail conflict

State the decision logic you would use when short-term revenue is positive but some user-experience guardrails worsen. Make the tradeoff explicit rather than defaulting to "ship" or "kill".

What a Strong Answer Covers

An explicit decision rule with three outcomes (launch / ramp-or-target / do-not-launch), not a binary.
The conflict resolved against long-term value (user and advertiser LTV), not this window's dollars.
A worked contrast between an acceptable guardrail move and a launch-blocking one.

What a Strong Answer Covers

These dimensions span all parts and should be visible across the answer as a whole:

Keeping interference central. The candidate never loses sight of the fact that a per-user intervention priced in a shared, budget-constrained auction violates SUTVA, and lets that drive the design, the metrics, and the company-level correction.
Cumulative-over-window thinking. Revenue is read as a trajectory across a full budget cycle, never as a single-day or per-impression snapshot.
Quantitative honesty. Estimates carry intervals, caveats are subtractive (they change the number), and the final recommendation is grounded in long-term user and advertiser lifetime value rather than the immediate lift.

Follow-up Questions

Suppose the user-level A/B shows +2.4% revenue but a concurrent geo holdout shows roughly 0%. How do you reconcile them, and which do you trust for the launch decision?
The week-1 lift is +5% but week-4 is +1% and still positive. How do you tell budget pull-forward apart from genuine user adaptation, and how does it change your company-level estimate?
Advertiser ROAS drops slightly in treatment. Why might that erode the revenue lift over a horizon longer than 8 weeks, and how would you monitor it post-launch?
How would you design a long-term holdout to keep measuring the launched change after the experiment ends?

Estimate ads ranking revenue impact

Company: Meta

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: medium

Interview Round: Onsite

Estimate ads ranking revenue impact

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Causal estimand

What a Strong Answer Covers

Part 2 — Experiment or quasi-experiment design

What a Strong Answer Covers

Part 3 — Primary, secondary, and guardrail metrics

What a Strong Answer Covers

Part 4 — Auction interference, budgets, seasonality, and heterogeneity

What a Strong Answer Covers

Part 5 — Translating to company-level revenue impact

What a Strong Answer Covers

Part 6 — Launch recommendation under a UX-guardrail conflict

What a Strong Answer Covers

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Estimate ads ranking revenue impact

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Causal estimand

What a Strong Answer Covers

Part 2 — Experiment or quasi-experiment design

What a Strong Answer Covers

Part 3 — Primary, secondary, and guardrail metrics

What a Strong Answer Covers

Part 4 — Auction interference, budgets, seasonality, and heterogeneity

What a Strong Answer Covers

Part 5 — Translating to company-level revenue impact

What a Strong Answer Covers

Part 6 — Launch recommendation under a UX-guardrail conflict

What a Strong Answer Covers

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP