Google Data Scientist Interview Questions
Google Data Scientist interview questions focus on rigorous statistical thinking, product-driven analysis, and practical data engineering skills. What’s distinctive about interviewing for a Data Scientist at Google is the combination of deep quantitative evaluation (hypothesis testing, causal inference, model evaluation), hands-on SQL/Python problem solving, and product intuition tied to measurable business metrics. Interviewers typically evaluate statistical rigor, experimental design, coding clarity, the ability to translate analysis into product decisions, and “Googleyness” — collaboration, ownership, and clear communication. Strong interview preparation centers on rehearsing technical fundamentals and concise storytelling of impact. Expect a short recruiter screen, one or more technical screens (SQL, statistics, coding), then a multi-interview loop of 3–5 sessions that mix statistics, applied analysis/product case work, coding/SQL tasks, and behavioral questions; successful candidates then go through a hiring-committee review and team-matching. To prepare, practice timed SQL and Python exercises, refresh core statistical concepts and A/B testing design, rehearse product-metrics case studies, and develop crisp STAR-style stories that quantify impact. Mock interviews and explaining reasoning aloud often yield the best gains.
Estimate population singletons from a 10% log
A daily search log has one row per query string. You draw a 10% simple random sample of rows without replacement. Define a “unique query” (singleton) ...
Measure causal impact of YouTube ads
Estimate the incremental effect of a 6‑week YouTube campaign on weekly online sales. - Explain why naive OLS of sales on ad spend is biased; list at l...
Assess Fundamental Statistics Knowledge in Data-Science Interviews
Fundamental Statistics (Technical Phone Screen) Context You are given standard statistics tasks commonly used in a data-science interview. Assume all ...
Compute monthly CRR with merges and gaps
You are given PostgreSQL tables user_profile(user_id, signup_ts, country, is_employee, is_test), user_events(user_id, event_ts, event_type, revenue, p...
Prove OLS invariance to linear transforms
You fit Model 1: y ~ X1 + X2. You also fit Model 2 using Z = [X1 − X2, X1 + X2] = X T where T = [[1,1], [−1,1]] (2×2, invertible). a) Prove that OLS p...
Analyze video flags and reviews with SQL
You are designing SQL queries for YouTube Trust & Safety. Use the schema and sample data below. Unless stated otherwise, treat a flag as reviewed if t...
Compute p-values, probabilities, and regularization choices
Answer all parts. A) Hand‑compute a two‑sided p‑value comparing two means using Welch’s t‑test. Sample A: n1=20, mean1=5.2, sd1=1.1. Sample B: n2=24, ...
Select MOST/LEAST appropriate actions (SJT)
Situational Judgment Test (SJT): Choose MOST/LEAST likely actions For each situation below, pick: - MOST likely action you would take - LEAST likely a...
Estimate b when features exceed samples
Consider the linear model y = Xb + ε with X ∈ R^{n×(m+1)} including an intercept. a) Derive the OLS estimator b̂ = (XᵀX)^{-1}Xᵀy, stating the rank con...
Handle p≈n linear regression with L1
You must fit linear regression with p = 500 predictors and n = 600 observations. What failure modes do you expect and why does OLS overfit when p is c...
Handle highly imbalanced classification data
You must build a binary classifier for fraud with a 0.2% positive rate and 10M rows × 500 features. Propose an end-to-end plan that covers: 1) data sp...
Build and evaluate a full ML pipeline
You must predict both (1) probability that a user will spend >$0 in the next 7 days (classification) and (2) expected spend in the next 7 days (regres...
Diagnose and reverse an adoption-rate decline
Problem: Investigating a 7pp Drop in Google Meet Enterprise Adoption Rate Context Over the last 4 calendar weeks, enterprise adoption rate has fallen ...
Explain mixed models and fixed vs random effects
In an applied DS setting, you are modeling an outcome (e.g., watch time per session, conversion, or rating) across multiple entities (e.g., users, cre...
Build Model to Predict Customer Contract Renewal
Predicting Enterprise Customer Renewal for Google Meet You are tasked with designing a model to predict whether an enterprise customer will renew thei...
Diagnose YouTube Usage Decline: Key Metrics and Segmentation
Scenario YouTube observes a sudden decline in daily active users (DAU) and total watch time across the platform. Task Design a systematic diagnosis pl...
Compute precision under noisy annotators
Two-Annotator Labeling Policy: Precision, Recall, F1, and Generalization You have two independent annotators who review videos and label them as "ille...
Find most co‑purchased product pairs in SQL
Given the schema and sample data below, write ANSI-SQL to return the top 5 unordered product pairs most frequently purchased together across distinct ...
Diagnose and fix flawed model fit
Fixing a Churn Classifier: Encoding, Imbalance, Evaluation, and Fairness Context You inherit a binary classifier that predicts churn=1. The current im...
Identify and Fix Predictive Model Performance Gaps
Model Review: Month Encoding, Feature Scaling, and Imbalanced Data Context You are auditing an existing predictive model for operational performance. ...