Reflect on a challenging project you led
Company: TikTok
Role: Data Scientist
Category: Behavioral & Leadership
Difficulty: hard
Interview Round: Technical Screen
Describe a project you led end-to-end that materially changed a product decision. Be specific: (a) the ambiguous problem and the success criteria you set; (b) the riskiest assumption you invalidated, how, and when; (c) a concrete pivot you made based on new data; (d) a high-stakes disagreement and how you earned alignment; (e) trade-offs under time/resource constraints and their rationale; (f) what you would do differently if starting again on 2025-09-01.
Quick Answer: This question evaluates leadership, cross-functional influence, and applied data science competencies—focusing on ambiguity management, causal measurement, hypothesis testing, and driving product decisions.
Solution
# Example, teaching-oriented answer (end-to-end, product-changing)
Project: Cold-start feed seeding for new users to reduce early churn.
Summary: We evaluated whether to introduce an “interest selection” onboarding step vs. silent personalization for the first-session feed. Data showed the added friction harmed activation/retention for most traffic. We pivoted to silent personalization plus lightweight exploration, and only used explicit interest selection for a narrow, high-intent segment. This changed the product plan from a global rollout of interest selection to a segmented strategy.
(a) Ambiguous problem and success criteria
- Ambiguity: New users were bouncing in their first session. PMs proposed an interest selection screen to improve personalization; Design worried about added friction; Eng warned about cold-start latency. The open question: Would explicit interests improve personalization enough to offset added steps?
- Decision options: (1) Ship interest selection to all new users; (2) Keep existing silent personalization; (3) Hybrid/segmented approach.
- Primary success metric: D7 retention uplift. Baseline ≈ 20%; Minimum detectable effect (MDE) = +0.5 percentage points (pp).
- Secondary/guardrails: D1 retention, new-user session length, hide/report rates, push opt-out, creator complaint rate, app uninstalls within 24h, and system health (latency/crashes).
- Sample sizing (illustrative): For a two-arm A/B with p=0.20 baseline, MDE=0.005, α=0.05, power=0.80:
n_per_arm ≈ 2*(Z_α + Z_β)^2 * p(1−p) / MDE^2 ≈ 2*(1.96+0.84)^2*0.16 / 0.005^2 ≈ 100k new users per arm.
- Experiment design: User-level randomization; pre-registered analysis; SRM check; CUPED (if available) to reduce variance; 14-day run to observe D7 and early D14 trends.
(b) Riskiest assumption invalidated (how and when)
- Riskiest assumption: “Explicit interest selection improves short- and medium-term retention net of added friction.”
- How we tested:
1) Offline replay: Used historical new-user sessions to simulate explicit interest picks by mapping early swipes to topic clusters; estimated upper-bound benefit of ‘perfect’ picks on feed relevance.
2) Rapid funnel experiment (Week 1): Randomized 20% of new installs into a prototype interest-selection step (3 picks required). Measured activation completion and time-to-first-video.
3) Full A/B (Weeks 2–3): Interest-selection vs. control. Primary: D7. Guardrails: D1, hides/reports, latency.
- Finding: The activation drop (−1.8pp completion, +6s time-to-first-video) outweighed personalization gains for most traffic. Early D1 was −0.3pp; D7 ATE hovered around −0.2pp with HTE showing improvements only in a small high-intent segment (e.g., users coming from creator-linked acquisition).
- Timing: invalidated by Day 10 (after 7 days of accumulation and pre-registered interim analysis).
(c) Concrete pivot based on new data
- Pivot: Abandoned global interest-selection rollout. Shipped silent personalization plus lightweight exploration for all new users, and restricted explicit interest selection to a narrow high-intent slice.
- What changed specifically:
- Feed seeding shifted to a hybrid: trending-but-diverse starter set, boosted early exploration, and fast adaptation to first 5–10 interactions.
- Removed mandatory ‘pick 3 interests’ for general traffic; interest selection remained as an optional interstitial only for high-intent referrals.
- Result (illustrative): +0.6% increase in quality-adjusted watch time in first session, +0.4pp D1, +0.5pp D7 (vs. baseline), no significant increase in hides/reports, and stable latency. This cleared our pre-set launch bar.
(d) High-stakes disagreement and how I earned alignment
- Disagreement: PM and Marketing favored a global interest-selection launch aligned to a brand campaign timeline. Eng and Data were concerned about friction and latency risk.
- Alignment approach:
- Pre-registered metrics and decision thresholds to avoid goalpost moving.
- Transparent interim readouts, SRM checks, and HTE by acquisition channel/locale.
- Simulations showing even ‘perfect’ interest picks could not recoup activation losses for most cohorts.
- Proposed a compromise: segment-specific rollout where the effect was positive, plus silent personalization elsewhere.
- Outcome: Cross-functional sign-off to pivot the product plan from global to segmented rollout, with updated experiments to monitor long-term retention.
(e) Trade-offs under time/resource constraints (and rationale)
- Time constraint: 3 weeks to decide before campaign creative locked.
- Trade-offs made:
- Shipped a minimal seed model (logistic regression on simple signals like locale, device language, time-of-day) rather than a deeper model, to meet timeline and keep latency < p95 target.
- Limited experiment scope: 20% traffic for prototype funnel test; then 50/50 split only for new installs in top 3 markets to hit sample size quickly, deferring long-tail locales.
- Reduced instrumentation to the critical events to avoid client release delays; deferred nicer-to-have surveys to post-launch.
- Accepted modest infra cost for early prefetch during first session to preserve experience quality.
- Rationale: Optimize for decision quality on the primary metric (D7) while staying inside latency and infra guardrails.
(f) What I’d do differently if starting again on 2025-09-01
- Plan a 28-day holdout for long-horizon retention and satisfaction stability, with sequential testing/alpha spending to allow earlier safe stops.
- Use off-policy evaluation (IPS/DR) from logged bandit data to down-select seeders before any user-facing test.
- Apply CUPED and variance reduction by default to reduce sample/time needs.
- Build a self-serve dashboard with pre-registered metrics and HTE slices to speed alignment.
- Introduce fairness/diversity guardrails earlier (e.g., content category and creator exposure diversity in cold-start).
- Start privacy/compliance review earlier for any new signals considered for personalization.
Validation and guardrails applied
- Randomization: user_id hashing; monitored SRM; no leak from acquisition channels.
- Metrics health: Monitored D1, D7, hides/reports, app crashes, latency p95; creator-side impact (new creator exposure share).
- Heterogeneous treatment effects: Segments by acquisition source, locale, device class; only high-intent cohort benefited from explicit interest selection.
- Power analysis and MDE pre-registration to avoid overfitting to noise.
- Post-launch monitoring: 4-week KPI drift checks; rollback plan documented.
Key takeaways
- Ambiguity resolution requires pre-registered success criteria and HTE, not just topline ATE.
- The highest risk was friction vs. personalization benefit; we invalidated it quickly with a funnel test and a short A/B.
- Data supported a pivot to a segmented strategy and silent personalization, materially changing the original product decision.