PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Analytics & Experimentation/Uber

Design a switchback and choose block length

Last updated: Jun 25, 2026

Quick Overview

This question tests the ability to design a switchback (time-based A/B) experiment for a two-sided marketplace with spillovers and autocorrelated demand. It evaluates expertise in causal inference, experiment design under market-level treatment constraints, and variance reduction techniques such as CUPED — core competencies for data scientist roles in experimentation.

  • hard
  • Uber
  • Analytics & Experimentation
  • Data Scientist

Design a switchback and choose block length

Company: Uber

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: hard

Interview Round: Technical Screen

Design a switchback experiment for airport pickup pricing in a marketplace with spillovers. Choose the block length and rotation schedule using empirical autocorrelation and carryover: median trip duration = 22 minutes; demand ACF falls below 0.1 at 75 minutes; strong peaks every ~4 hours. Specify: (a) block length selection to minimize carryover yet retain power, (b) diagnostics to detect residual carryover (pre/post contrasts, lag terms), and (c) variance and sample-size computation under block randomization (include formulas and required inputs). Describe how you will randomize across time-of-day and days-of-week and how you will incorporate covariate adjustment (e.g., CUPED) to reduce variance.

Quick Answer: This question tests the ability to design a switchback (time-based A/B) experiment for a two-sided marketplace with spillovers and autocorrelated demand. It evaluates expertise in causal inference, experiment design under market-level treatment constraints, and variance reduction techniques such as CUPED — core competencies for data scientist roles in experimentation.

Related Interview Questions

  • Design a Maps Address Search Bar - Uber
  • Evaluate a cold-start rating launch - Uber (medium)
  • Design Pricing Model Experiment - Uber (medium)
  • Evaluate marketplace interventions - Uber (medium)
  • Evaluate UberEATS priority delivery and membership - Uber (medium)
|Home/Analytics & Experimentation/Uber

Design a switchback and choose block length

Uber logo
Uber
Oct 13, 2025, 9:49 PM
hardData ScientistTechnical ScreenAnalytics & Experimentation
26
0

Switchback Experiment Design: Airport Pickup Pricing with Spillovers

You are a data scientist designing a switchback (time-based A/B) experiment to evaluate a new airport pickup pricing algorithm in a two-sided ride-hailing marketplace. The airport has strong within-day seasonality and significant spillovers: drivers and the dispatch queue carry state across time, so a price change in one window affects supply and rider behavior in adjacent windows. Because of these spillovers, treatment must be assigned at the airport level, and only one arm (treatment or control) can be live at a time — you alternate the whole market between arms over time rather than splitting riders into two groups.

You are given the following empirical facts measured at this airport:

  • Median trip duration ≈22\approx 22≈22 minutes.
  • Demand autocorrelation function (ACF) falls below 0.10.10.1 at ≈75\approx 75≈75 minutes (i.e., demand is meaningfully self-correlated for a little over an hour).
  • Strong periodic demand peaks every ≈4\approx 4≈4 hours (driven by flight banks / arrival waves).

Design the experiment end to end. The question has three core parts plus two cross-cutting requirements: how you randomize across time-of-day and day-of-week, and how you use covariate adjustment (e.g., CUPED) to reduce variance.

Constraints & Assumptions

  • Single market, mutually exclusive arms. One airport queue; at any instant the market is either fully on treatment or fully on control. This rules out a standard user-split A/B test.
  • Spillover horizon. Carryover from one block can contaminate the next; the relevant timescales are trip duration ( ≈22\approx 22≈22 min, how long the system takes to "flush" in-progress state) and the demand ACF horizon ( ≈75\approx 75≈75 min).
  • No global control group. Effects are estimated from time contrasts between treatment and control blocks, so you must account for time-of-day and day-of-week confounding by design.
  • Primary metric is a continuous business outcome (e.g., revenue per request, completion rate, pickups per minute, or driver idle time) — analyze one pre-registered primary metric at a time.
  • Assume you have historical data at this airport (per-minute / per-request demand, pricing, completions, weather, flight schedules) to estimate variance components and fit covariate models before launch.

Clarifying Questions to Ask

  • What is the primary success metric and its decision direction? (Revenue per request vs. completion rate vs. driver idle time can imply different washout and power needs.)
  • What minimum detectable effect (MDE) matters for a launch decision, in absolute or relative terms?
  • How long can the experiment run (days/weeks), and is continuous operation acceptable, or are there blackout periods?
  • Are multiple terminals served by one shared driver pool? (If so they must be treated as a single cluster, not independent markets.)
  • Are any other pricing or dispatch experiments running concurrently at this airport that would interfere with the same queue?
  • What is the operational minimum block length the pricing/dispatch system can switch on without instability?

Part 1 — Block length and rotation schedule

Choose a block length and rotation schedule that minimizes carryover yet retains statistical power. Justify your block length quantitatively from the three empirical facts (trip duration, ACF horizon, 4-hour periodicity), and explain the tension between long blocks (less carryover, cleaner measurement) and short blocks (more blocks, more power). Specify any post-switch washout/burn-in you would discard and why, and describe how the rotation sequence is generated and constrained.

What This Part Should Cover

  • A concrete block length decomposed into washout + measurement , each justified by a specific empirical fact.
  • Explicit reasoning about the 240-minute periodicity (why the block length should not divide it).
  • The power vs. carryover trade-off stated quantitatively (more blocks vs. cleaner blocks).
  • A rotation rule (balance target, run-length cap, how the sequence drifts across hours over days).

Part 2 — Diagnostics for residual carryover

Define the diagnostics you will run to detect residual carryover — i.e., contamination that survives your washout. Describe at least two complementary methods, the model/estimator behind each, and the specific signal that would tell you the washout is too short (and what you would do about it).

What This Part Should Cover

  • An event-time / pre-post profile around switch boundaries with the expected clean-vs-contaminated signatures.
  • A lagged-treatment regression spanning at least the dependence horizon, with an explicit null hypothesis on the lag coefficients.
  • A balance / randomization check (covariate standardized differences between T and C blocks) and a residual-autocorrelation check on block means.
  • A stated remediation when carryover is detected (increase washout/block length; switch to robust SEs).

Part 3 — Variance and sample-size computation

Provide the variance estimator and sample-size formula under block randomization. Give the explicit formulas, list the required inputs (and where each comes from), and show how within-block dependence and CUPED enter the calculation. A short numeric illustration of the method is welcome.

What This Part Should Cover

  • The two-sample sample-size formula and the matching variance-of-the-effect estimator at the block level.
  • A derivation of between-block variance σb2\sigma_b^2σb2​ from unit-level variance, observations-per-block mmm , and within-block correlation ρˉ\bar\rhoρˉ​ (a design-effect argument).
  • The explicit list of required inputs and their sources (historical baseline variance, mmm , ICC/ACF, MDE δ\deltaδ , CUPED R2R^2R2 , α\alphaα , power).
  • Treatment of residual block-level dependence (HAC/Newey–West) and how CUPED reduces the required nnn .

What a Strong Answer Covers

Across all parts, a strong candidate ties every design choice back to the three empirical facts and the spillover structure, and keeps the estimand and unit of randomization (the block) consistent from design through analysis. Beyond the per-part rubrics, look for:

  • Time confounding handled by design, not luck : an explicit randomization scheme across time-of-day and day-of-week (e.g., stratify by hour-of-week, 24×7=16824\times7=16824×7=168 strata; rerandomize to balance T/C within strata) so neither arm is over-exposed to peaks or weekends.
  • Internal consistency : the washout chosen in Part 1, the lag horizon tested in Part 2, and the ICC/ACF used in Part 3 all reference the same timescales coherently.
  • CUPED integrated correctly : pre-period covariates at the same hour-of-week, θ\thetaθ estimated out-of-sample / via cross-fitting, variance-reduction R2R^2R2 flowed back into the power calculation.
  • Pragmatism and guardrails : one arm live at a time, no overlapping experiments on the same queue, terminals sharing drivers treated as one cluster, and a pre-registered analysis plan (metric, SEs, seed).

Follow-up Questions

  • A standard rider-split A/B test is operationally simpler than a switchback. Given the spillovers described here, explain precisely why it is biased and when (if ever) you would still prefer it. (References the spillover premise across all parts.)
  • Suppose your Part 2 diagnostics show statistically significant carryover at lag 1 but the effect is small. Would you stop, lengthen the washout mid-run, or model it explicitly — and how does each choice affect the Part 3 power calculation?
  • How would you extend this single-airport design to simultaneously test across many airports , and what new sources of interference or heterogeneity would you have to control for?
  • If the treatment effect is suspected to be heterogeneous by time-of-day (e.g., larger during peaks), how would you change the analysis to estimate that, and what does it imply for the rollout decision?
Loading comments...

Browse More Questions

More Analytics & Experimentation•More Uber•More Data Scientist•Uber Data Scientist•Uber Analytics & Experimentation•Data Scientist Analytics & Experimentation

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.