How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

What difficulty level is this interview question?

This is a easy difficulty Analytics & Experimentation question, commonly asked during Technical Screen rounds at Uber.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Uber during technical interviews.

Measure feature impact with switchback, PSM, and CACE

Q: Measure feature impact with switchback, PSM, and CACE

This question evaluates proficiency in causal inference and experimental design for product metrics, specifically testing knowledge of switchback experiments, time-series adjustments (seasonality and autocorrelation), propensity score methods for observational comparisons, and complier-average causal effect (LATE/CACE) estimation; it belongs to the Analytics & Experimentation and Data Science domain. It is commonly asked because it probes handling of real-world complications—time and city heterogeneity, non-compliance, and derived-metric inference—and tests both conceptual understanding of causal assumptions and practical application of statistical adjustment and inference.

You work at a ridesharing company and want to measure the impact of a new membership feature on rides-per-user (RPU). Across the parts below you will measure this effect under three different evidentiary situations: a switchback experiment, an observational launch with no experiment, and a randomized experiment with non-compliance.

For each measurement approach, you are expected to state the key assumptions, name the likely pitfalls, and propose at least one robustness or sensitivity check.

Constraints & Assumptions

The outcome of interest is rides-per-user (RPU) ; in Part C you also estimate impact on profit-per-user (PPU) , a derived metric (e.g. revenue per user minus cost per user).
The marketplace has two-sided interference : a user's experience (price, ETA, availability) depends on other users and on the supply of drivers, so naive user-level randomization can leak across units. Exception: in Part B you may assume supply is unlimited , so supply constraints do not confound the outcome through availability.
Treatment in Part A is the membership experience/eligibility being switched on or off at the (city × time) level; in Parts B and C treatment is an individual user actually holding membership.
You have access to standard pre-treatment user history (past rides, spend, tenure, app activity, geography) and city-day operational signals (weather, surge, marketing spend).

Clarifying Questions to Ask

What exactly is the decision the measurement will inform — a global launch/no-launch call, or sizing the expected lift for forecasting?
Is membership reversible (a per-trip benefit that can be toggled) or a sticky enrollment state that persists once a user joins? This determines whether a switchback is even valid.
What is the minimum detectable effect and budget (number of cities, weeks of runway) we are working with?
How is RPU defined — over what window, over which denominator (all eligible users, active users, or members)?
Is the estimand of interest the effect on everyone eligible (ATE) or on the users who actually take up membership (ATT / effect on members)?

Part A — Switchback experimentation

You run a switchback experiment with randomization at (day × city) granularity.

Propose a practical switchback design: what unit is randomized, the assignment scheme over time, and the duration.
Explain why you generally should not take the raw aggregated city-day results and run a vanilla two-sample t-test .
Describe an analysis approach that simultaneously accounts for time trends / seasonality, city-level heterogeneity, autocorrelation induced by switchbacks, and covariate adjustment.

What This Part Should Cover

A concrete design: randomized/balanced time sequence within each city (not a single switch), multiple alternations, duration spanning weekday/weekend seasonality, and an explicit stance on washout/carryover.
Correctly names the t-test failures: serial autocorrelation (understated SEs → false positives), city heterogeneity, and unequal exposure/weighting.
A two-way fixed-effects (or mixed-effects) model with cluster-robust SEs at the city level, plus user-volume weighting and a diagnostic (pre-trend check, holiday exclusion).

Part B — No A/B test available (observational measurement)

Assume the membership feature was launched without an A/B test, and assume supply is unlimited (so availability does not confound the outcome).

How would you estimate the causal impact of membership on RPU using propensity score matching (PSM) , or a closely related propensity-score method?
How would you assess whether your matching/weighting is "good enough" to trust the estimate?

What This Part Should Cover

A clear estimand (ATT vs ATE) and a propensity model built strictly from pre-treatment confounders, with explicit warning against post-treatment / collider features.
The choice among matching, IPW, doubly-robust, and PSM+DiD, with a reason.
A concrete validation bundle: overlap, covariate balance (SMD), placebo/negative-control tests, and an unmeasured-confounding sensitivity analysis — plus the honest caveat that observational identification rests on an untestable assumption.

Part C — A/B test exists but with non-compliance

Now assume you ran a user-level randomized experiment, but not everyone assigned to treatment actually takes up membership (one-sided or two-sided non-compliance).

How would you estimate the causal effect on RPU using the Complier Average Causal Effect (CACE / LATE) ?
How would you compute a confidence interval for (a) the impact on RPU and (b) the impact on profit-per-user (PPU) , a derived metric?

What This Part Should Cover

The IV/LATE setup with all three assumptions (random assignment, exclusion restriction, monotonicity) and the Wald = ITT $_Y$ / ITT $_D$ estimator, equivalently 2SLS.
Distinguishing the ITT estimand (effect of being offered) from CACE (effect on compliers), and noting that ITT is the right number for some launch decisions.
Correct CI machinery for a ratio estimand (2SLS robust SE / delta method / bootstrap) and a clean treatment of the derived PPU metric via per-user aggregation, including a first-stage strength check.

What a Strong Answer Covers

These dimensions span all three parts.

Estimand discipline: the candidate states what causal quantity (ATE / ATT / ITT / CACE) is being estimated and matches it to the business decision, rather than reporting an undefined "lift".
Assumption honesty: each method's identifying assumption is named (interference/SUTVA in A, unconfoundedness+overlap in B, exclusion+monotonicity in C), and the candidate is explicit about which are testable vs untestable.
Inference rigor: standard errors respect the data-generating structure (clustering for serial correlation; ratio-aware inference for CACE; per-user aggregation for derived metrics).
A robustness/sensitivity check per approach , as the prompt explicitly requires.

Follow-up Questions

Suppose in Part A you find membership lift is much larger in a handful of dense cities. How would you test whether this is real treatment-effect heterogeneity versus an artifact of unbalanced switchback assignment?
In Part B, your placebo test on a pre-period outcome comes back significantly non-zero. What does that tell you, and what do you do next?
In Part C, the first-stage take-up gap $\mathbb{E}[D\mid Z=1]-\mathbb{E}[D\mid Z=0]$ is only a few percentage points. How does this affect your CACE estimate and its confidence interval, and how would you communicate the result to stakeholders?
The marketplace interference assumed away in Part B (unlimited supply) is dropped. How would the presence of a finite driver supply change your approach across all three parts?

For each measurement approach, you are expected to state the key assumptions, name the likely pitfalls, and propose at least one robustness or sensitivity check.

Constraints & Assumptions

The outcome of interest is rides-per-user (RPU) ; in Part C you also estimate impact on profit-per-user (PPU) , a derived metric (e.g. revenue per user minus cost per user).
The marketplace has two-sided interference : a user's experience (price, ETA, availability) depends on other users and on the supply of drivers, so naive user-level randomization can leak across units. Exception: in Part B you may assume supply is unlimited , so supply constraints do not confound the outcome through availability.
Treatment in Part A is the membership experience/eligibility being switched on or off at the (city × time) level; in Parts B and C treatment is an individual user actually holding membership.
You have access to standard pre-treatment user history (past rides, spend, tenure, app activity, geography) and city-day operational signals (weather, surge, marketing spend).

Clarifying Questions to Ask

What exactly is the decision the measurement will inform — a global launch/no-launch call, or sizing the expected lift for forecasting?
Is membership reversible (a per-trip benefit that can be toggled) or a sticky enrollment state that persists once a user joins? This determines whether a switchback is even valid.
What is the minimum detectable effect and budget (number of cities, weeks of runway) we are working with?
How is RPU defined — over what window, over which denominator (all eligible users, active users, or members)?
Is the estimand of interest the effect on everyone eligible (ATE) or on the users who actually take up membership (ATT / effect on members)?

Part A — Switchback experimentation

You run a switchback experiment with randomization at (day × city) granularity.

Propose a practical switchback design: what unit is randomized, the assignment scheme over time, and the duration.
Explain why you generally should not take the raw aggregated city-day results and run a vanilla two-sample t-test .
Describe an analysis approach that simultaneously accounts for time trends / seasonality, city-level heterogeneity, autocorrelation induced by switchbacks, and covariate adjustment.

What This Part Should Cover

A concrete design: randomized/balanced time sequence within each city (not a single switch), multiple alternations, duration spanning weekday/weekend seasonality, and an explicit stance on washout/carryover.
Correctly names the t-test failures: serial autocorrelation (understated SEs → false positives), city heterogeneity, and unequal exposure/weighting.
A two-way fixed-effects (or mixed-effects) model with cluster-robust SEs at the city level, plus user-volume weighting and a diagnostic (pre-trend check, holiday exclusion).

Part B — No A/B test available (observational measurement)

Assume the membership feature was launched without an A/B test, and assume supply is unlimited (so availability does not confound the outcome).

How would you estimate the causal impact of membership on RPU using propensity score matching (PSM) , or a closely related propensity-score method?
How would you assess whether your matching/weighting is "good enough" to trust the estimate?

What This Part Should Cover

A clear estimand (ATT vs ATE) and a propensity model built strictly from pre-treatment confounders, with explicit warning against post-treatment / collider features.
The choice among matching, IPW, doubly-robust, and PSM+DiD, with a reason.
A concrete validation bundle: overlap, covariate balance (SMD), placebo/negative-control tests, and an unmeasured-confounding sensitivity analysis — plus the honest caveat that observational identification rests on an untestable assumption.

Part C — A/B test exists but with non-compliance

Now assume you ran a user-level randomized experiment, but not everyone assigned to treatment actually takes up membership (one-sided or two-sided non-compliance).

How would you estimate the causal effect on RPU using the Complier Average Causal Effect (CACE / LATE) ?
How would you compute a confidence interval for (a) the impact on RPU and (b) the impact on profit-per-user (PPU) , a derived metric?

What This Part Should Cover

The IV/LATE setup with all three assumptions (random assignment, exclusion restriction, monotonicity) and the Wald = ITT $_Y$ / ITT $_D$ estimator, equivalently 2SLS.
Distinguishing the ITT estimand (effect of being offered) from CACE (effect on compliers), and noting that ITT is the right number for some launch decisions.
Correct CI machinery for a ratio estimand (2SLS robust SE / delta method / bootstrap) and a clean treatment of the derived PPU metric via per-user aggregation, including a first-stage strength check.

What a Strong Answer Covers

These dimensions span all three parts.

Estimand discipline: the candidate states what causal quantity (ATE / ATT / ITT / CACE) is being estimated and matches it to the business decision, rather than reporting an undefined "lift".
Assumption honesty: each method's identifying assumption is named (interference/SUTVA in A, unconfoundedness+overlap in B, exclusion+monotonicity in C), and the candidate is explicit about which are testable vs untestable.
Inference rigor: standard errors respect the data-generating structure (clustering for serial correlation; ratio-aware inference for CACE; per-user aggregation for derived metrics).
A robustness/sensitivity check per approach , as the prompt explicitly requires.

Follow-up Questions

Suppose in Part A you find membership lift is much larger in a handful of dense cities. How would you test whether this is real treatment-effect heterogeneity versus an artifact of unbalanced switchback assignment?
In Part B, your placebo test on a pre-period outcome comes back significantly non-zero. What does that tell you, and what do you do next?
In Part C, the first-stage take-up gap $\mathbb{E}[D\mid Z=1]-\mathbb{E}[D\mid Z=0]$ is only a few percentage points. How does this affect your CACE estimate and its confidence interval, and how would you communicate the result to stakeholders?
The marketplace interference assumed away in Part B (unlimited supply) is dropped. How would the presence of a finite driver supply change your approach across all three parts?

Measure feature impact with switchback, PSM, and CACE

Quick Overview

Measure feature impact with switchback, PSM, and CACE

Constraints & Assumptions

Clarifying Questions to Ask

Part A — Switchback experimentation

What This Part Should Cover

Part B — No A/B test available (observational measurement)

What This Part Should Cover

Part C — A/B test exists but with non-compliance

What This Part Should Cover

What a Strong Answer Covers

Follow-up Questions

Write your answer

Measure feature impact with switchback, PSM, and CACE

Quick Overview

Measure feature impact with switchback, PSM, and CACE

Constraints & Assumptions

Clarifying Questions to Ask

Part A — Switchback experimentation

What This Part Should Cover

Part B — No A/B test available (observational measurement)

What This Part Should Cover

Part C — A/B test exists but with non-compliance

What This Part Should Cover

What a Strong Answer Covers

Follow-up Questions

Write your answer