Plan and validate ranking experiment
Company: SoFi
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Technical Screen
You have a new ranking algorithm for the home page and must validate it safely. Design a three-stage evaluation plan: offline replay with IPS/DR, small-scale interleaving (team-draft), then full A/B. Be concrete: (1) Define exposure unit (impression-level vs. session-level) and bucketing to avoid contamination across sessions/devices. (2) Primary metric is 30-day funded-account conversion per 1,000 impressions; baseline = 1.20%, target relative uplift = +5%, power = 0.8, alpha = 0.05. Compute the per-arm sample size assuming independent impressions, then discuss inflation for repeated exposures and cluster-robust variance. (3) List guardrails (p95 latency, app crash rate, CS tickets, decline rate) and how you’ll set sequential boundaries (e.g., alpha spending or SPRT) to allow early stop without inflating Type I error. (4) Explain how to mitigate novelty effects, carryover, and seasonality; specify ramp policy and duration for capturing 30-day outcomes while using proxy metrics for early reads with CUPED or covariate adjustment. (5) Describe heterogeneous treatment effect analysis (new vs. existing users, credit tiers) and how you’ll control false discovery with BH or Holm. (6) Provide a plan to detect p-hacking/Simpson’s paradox and define ship criteria when primary and guardrails disagree.
Quick Answer: This question evaluates skills in experimental design and analytics, covering offline counterfactual replay, interleaving and A/B testing, sample-size and power computation, sequential testing and alpha spending, guardrail monitoring and ramp policies, proxy metrics and covariate adjustment, heterogeneous treatment effect analysis, and governance concerns such as p-hacking and Simpson’s paradox within the Analytics & Experimentation domain for Data Scientist roles. It is commonly asked to probe proficiency in rigorously validating ranking changes while balancing statistical error, operational risk and bias mitigation, and it emphasizes practical application of applied statistical concepts and experiment governance rather than purely theoretical understanding.