Google Statistics & Math Interview Questions
Master your tech interview with our curated database of real questions from top companies.
Analyze Linear Regression Changes with Duplicated Observations
Linear Regression, p-values, and Chi-square with Large Samples Context You are analyzing regression and goodness-of-fit results. Consider what happens...
Estimate Population Mean and Conversion Rate Accurately
Statistical Inference: Hypothesis Tests, Confidence Intervals, Sampling Design, and Truncated Normal Estimation Context You are evaluating a set of pr...
Generate Samples from Truncated Normal Distribution
Scenario You draw from a normal distribution but only keep observations that are greater than 1 (i.e., values below 1 are discarded). Assume the origi...
Determine Impact of New Chat-Notification on User Engagement
Scenario A product team wants to determine whether a new chat-notification design increases daily active users (DAU) in Google Workspace Chat. Task De...
Determine Normality of Single Observation with Z-Test
Hypothesis Test for a Single Observation Against a Standard Normal Context You observe a single numeric value, x, and want to decide whether it could ...
Explain Simpson’s Paradox and Its Causes with Example
Simpson’s Paradox: Definition, Cause, and Example Task You are asked to demonstrate your understanding of Simpson’s paradox in a statistics/analytics ...
Assess Fundamental Statistics Knowledge in Data-Science Interviews
Fundamental Statistics (Technical Phone Screen) Context You are given standard statistics tasks commonly used in a data-science interview. Assume all ...
Compute precision under noisy annotators
Two-Annotator Labeling Policy: Precision, Recall, F1, and Generalization You have two independent annotators who review videos and label them as "ille...
Estimate population singletons from a 10% log
A daily search log has one row per query string. You draw a 10% simple random sample of rows without replacement. Define a “unique query” (singleton) ...
Prove OLS invariance to linear transforms
You fit Model 1: y ~ X1 + X2. You also fit Model 2 using Z = [X1 − X2, X1 + X2] = X T where T = [[1,1], [−1,1]] (2×2, invertible). a) Prove that OLS p...
Test a coefficient and explain t-distribution
In OLS, test whether feature j is relevant. a) State H0: β_j = 0 versus H1: β_j ≠ 0 and construct the t‑statistic t_j = b̂_j / se(b̂_j), giving the ex...
Narrow a confidence interval for a mean
You have a simple random sample with n = 100 and sample mean 100. The current 95% CI for the population mean is 100 ± 10, which a PM says is too wide....
Estimate unbiased ad scores with many reviewers
You have 1,000 ads and 100 reviewers; each reviewer rates 100 ads on a 1–10 scale with incomplete overlap. Specify a mixed-effects model to estimate l...
Explain and resolve Simpson’s paradox
Define Simpson’s paradox and construct a concrete numeric example where group-wise success rates favor treatment in each subgroup but the aggregate ra...
Infer distribution and choose robust statistics
A dataset of n=10,000 session revenues (USD) has: 65% zeros; mean=8.5; median=0; p90=30; p95=120; p99=620. (a) Propose a plausible generative model (e...
Infer causal impact without an A/B test
Evaluate Impact of a Shipped Version on Disconnections (No A/B Holdout) Context A new client version was shipped system-wide with the goal of reducing...
Compute p-values, probabilities, and regularization choices
Answer all parts. A) Hand‑compute a two‑sided p‑value comparing two means using Welch’s t‑test. Sample A: n1=20, mean1=5.2, sd1=1.1. Sample B: n2=24, ...
Analyze data duplication effects in linear regression
OLS With Duplicated Observations: Estimator, Variance, and Inference Pitfalls Context: You have the linear model y = Xβ + ε with full-rank X ∈ ℝ^{n×p}...
Understand Simpson's Paradox with Simple Examples
Scenario You are a data scientist advising a product team on statistical analysis and experimental design. Tasks 1) Simpson’s paradox - Explain Simpso...
Define and sample a truncated normal
Define the truncated normal Z | a < Z < b for Z ~ N(0,1): write the normalized pdf and cdf. Then design efficient samplers for three cases: (i) a = 1,...