What does the Google Data Scientist interview process look like?

Based on candidate reports compiled in this guide, the Google Data Scientist loop typically includes 3 stages: Technical Screen, Onsite, Take-home Project. Each stage covers a distinct set of topics walked through in detail above.

What topics does Google focus on in Data Scientist interviews?

Google Data Scientist interviews cover Analytics & Experimentation, Data Manipulation (SQL/Python), Statistics & Math, Machine Learning, Behavioral & Leadership. The guide above breaks each topic down into core concepts, worked examples, and the real questions candidates were asked.

How many real Google Data Scientist interview questions are in this guide?

This guide is anchored to 27 real Google Data Scientist interview questions sourced from candidate reports, each linked to a full practice page with starter code, solution discussion, and community comments.

Google Data Scientist Interview Prep Guide

Everything Google actually asks Data Scientist candidates — concept walkthroughs, worked examples, and the real interview questions, drawn from candidate reports. Free to read.

Google Data Scientist Interview Cheatsheet cover

Technical Screen

Analytics & Experimentation

A/B Testing And Product Metric Diagnostics — covered in depth under Onsite below.
Propensity Score Matching And Observational Causal Inference — covered in depth under Onsite below.

Data Manipulation (SQL/Python)

SQL Analytics And Event Data Manipulation — covered in depth under Take-home Project below.

Python, Pandas, NumPy, And R Data Manipulation

Top-to-bottom decision flowchart for choosing vectorized pandas/NumPy/dplyr patterns: start, is it vectorizable, group-aggregate, join, conditional, simulation, missing/normalization, validate complexity.

What's being tested

This tests vectorized tabular manipulation in pandas, NumPy, and dplyr: create derived columns, join lookup tables, compute group aggregates, and run small simulations without row-by-row loops. Interviewers are probing whether you can write correct, scalable analysis code while handling missing values, type coercion, random sampling, and edge cases.

Patterns & templates

Vectorized conditionals — use np.select, np.where, case_when, or boolean masks; encode precedence explicitly from most-specific to least-specific condition.
Group-wise transforms — use df.groupby(keys)[col].transform('mean') to broadcast aggregates back to rows; in dplyr, use group_by() plus mutate().
Join then mutate — use left_join() / merge(..., how='left') to attach treatment parameters, then compute adjusted values; validate row counts after joins.
Random simulation — use sample_n, slice_sample, np.random.binomial, or np.random.default_rng; set seeds for reproducibility and avoid repeated loops when vectorization works.
Column normalization — compute column sums with axis=0, divide via broadcasting, and define behavior for zero-sum columns before coding.
Missing-value semantics — NaN comparisons are false in pandas; use isna(), notna(), fillna(), and nullable dtypes deliberately.
Complexity expectations — most solutions should be O(n) or O(n + k) time with linear memory; avoid apply(axis=1) unless data is tiny or logic is non-vectorizable.

Common pitfalls

Pitfall: Treating NaN == NaN as true or using normal comparisons on missing numeric fields; use isna() / notna() instead.

Pitfall: Creating many-to-many joins accidentally and inflating rows; check key uniqueness and compare pre/post row counts.

Pitfall: Normalizing by a zero column sum and returning inf or NaN unintentionally; specify whether to keep zeros, return NaN, or skip the column.

Practice these

The practice cards below cover the canonical variants — solve all of them and time yourself.

Practice questions

Google

Medium

Data Scientist

Implement R dplyr simulation and left join

Evaluates proficiency with data manipulation and simulation in R using dplyr, covering randomized sampling, vectorized transformations, left joins...

Google Data Scientist Interview Prep Guide

Technical Screen

Analytics & Experimentation

Data Manipulation (SQL/Python)

What's being tested

Patterns & templates

Common pitfalls

Practice these

Implement R dplyr simulation and left join

Add a conditional column in Python

Calculate User Deviation from Team Average Messages

Statistics & Math

Machine Learning

Behavioral & Leadership

Onsite

Analytics & Experimentation

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Design an A/B test with guardrails and SRM checks

Design A/B Test to Isolate Product Usage Drop Causes

Evaluate Auto-Reply Feature Success with Metrics and Experiments

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

How would you use propensity score matching here

Design a Causal Upgrade Experiment

Design A/B Test for Subscription Price Increase Effectiveness

Data Manipulation (SQL/Python)

Statistics & Math

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Explain Bootstrap and Statistical Inference

Estimate Population Mean and Conversion Rate Accurately

Analyze Linear Regression Changes with Duplicated Observations

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Prove OLS invariance to linear transforms

Analyze data duplication effects in linear regression

Assess Fundamental Statistics Knowledge in Data-Science Interviews

Machine Learning

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Explain logistic regression vs forests and boosting

Handle highly imbalanced classification data

Build Classifier: Evaluate with AUROC for Imbalanced Data

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

When do you use mixed-effects models

Adjust YouTube Ad Scores Using Mixed-Effects Linear Regression

Identify and Fix Predictive Model Performance Gaps

Behavioral & Leadership