Analyze time series and design validation experiment
Company: Google
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Onsite
You are given a daily time series Y_t representing the count of user reports of policy-violating content over the last 365 days for a single market. There are missing days, weekly seasonality, occasional outlier spikes due to takedown sweeps, and a suspected structural break around 2025-06-10 when a new detection rule launched.
Tasks:
1) Characterize the series quickly: describe exactly how you would impute missing days, robustly estimate trend and weekly seasonality (e.g., STL with robust weights), and identify outliers that should be down-weighted rather than removed.
2) Test for a structural break near 2025-06-10: specify the exact change-point method (e.g., PELT with a piecewise-constant mean cost, or Bayesian Online Change Point Detection), your penalty/priors, and the decision rule for accepting a break (include thresholds you would tune and why).
3) Quantify the effect size: estimate the level shift (absolute and percent) attributable to the break after removing seasonality; report a 95% interval and explain the uncertainty source.
4) Forecast the next 14 days with 80% prediction intervals using a model appropriate for count data (e.g., Poisson or Negative Binomial with log link + seasonal dummies/prophet-like components). Explain how you would check calibration of intervals.
5) Causality follow-up: propose a lightweight validation design to attribute the change to the rule launch (e.g., geographic or traffic-channel holdout, phased rollout, or synthetic control). Specify the unit of analysis, pre-period length, primary metric, and the exact statistical test you would run. Include how you would guard against interference and seasonality biases.
6) Communicate results: provide the two most decision-relevant plots/tables you would include and the single-sentence takeaway you would give an exec if the break is real but beneficial.
Quick Answer: This question evaluates competency in time series analysis, change-point detection, count-based forecasting, causal inference and experiment design, along with skills in handling missing observations, multiplicative weekly seasonality, outliers, structural breaks, effect sizing, and concise executive communication.