Data Scientist Interview Questions
Practice the exact questions companies are asking right now.
Compute ads revenue by geography in SQL
You have ad delivery logs for a shop-ads system. Tables ad_impressions - impression_id STRING (PK) - ts TIMESTAMP (UTC) - user_id STRING - shop_id STR...
How to evaluate similar-listing notifications feature
Case study (Marketplace product analytics) Context: Circle is a US marketplace app for buying and selling second‑hand products. On a product listing p...
How predict vehicles’ turn direction at intersection?
At an intersection, there are N vehicles stopped or moving slowly. For each vehicle you have historical time-series data up to the current time: - Pos...
Compute pirated-theme usage and revenue loss
You work on a theme marketplace. Some shops install pirated themes instead of paying for official themes. Assume all timestamps are in UTC. Tables sho...
Compute probability an account is fake
A platform uses an automated classifier to flag potentially fake accounts. Assume: - Base rate: 2% of accounts are fake. - The classifier flags a fake...
Evaluate new shop-ads ranking algorithm
You work on a marketplace with shop ads. A new ranking/recommendation algorithm is proposed to promote shop ads more aggressively, but stakeholders ar...
Handle conflict and urgent shifting priorities
Answer the following behavioral questions with concrete examples from your experience: 1. Describe a conflict you had with a partner or teammate. What...
Design and evaluate a fraud detection strategy
Context You are interviewing for a Fraud Data Scientist role at a payments company. The company has a fraud model and some operational constraints. Pa...
Design a lead-scoring model
Context You are interviewing for a Data Scientist role on a marketing/growth team. The business wants lead scoring: ranking or scoring incoming leads ...
Determine earliest collision among moving cars
You are given n cars moving over time. Each car has known initial state at time \(t=0\): - 1D case (straight road): - initial position \(x_i\) (mete...
Retrieve First Active and Last Inactive Dates per User
Given a table activity that tracks user activities, write a SQL query to retrieve the first active date and last inactive date for each user. Table Sc...
Answer ancestor-walk queries on a rooted tree
Problem (Tree / Parent Array Queries) You are given a rooted tree with n nodes labeled 1..n, represented by a parent array par[1..n]: - par[i] is the ...
Explain list vs tuple in Python
Question In Python: 1. What are the key differences between a list and a tuple? 2. When would you prefer using a tuple over a list? 3. What are the pe...
Explain and interpret p-values correctly
Context You are evaluating a change to a fraud decision rule (e.g., a new threshold or step-up authentication rule). You run an experiment comparing C...
Explain KNN and how to tune it
K-Nearest Neighbors (KNN) fundamentals You are interviewing for a Data Scientist role. 1. Explain how the KNN algorithm works for both classification ...
How validate a driving simulation is realistic?
You work on evaluating Waymo’s driving simulation. You have: - Real-world (logged) driving data collected on-road. - Simulated driving data generated ...
Return the largest file size in a directory
Problem (Bash / File System) Implement a Bash function (or shell snippet) that returns the maximum file size (in bytes) under a given directory. Requi...
Compare two rare-event detection models statistically
You are evaluating two models (Model A and Model B) for rare-event detection (e.g., fraud, abuse, medical adverse event). Positives are extremely rare...
Use Bayes to interpret a broken radar alarm
A “radar” system (or anomaly alarm) is suspected to be unreliable. You are asked to interpret its alerts and recommend how to operate it. Given Define...
Design metrics and experiment for stolen-post detection
You work on Stolen Post Detection for a social platform (detecting content that is copied/reposted without permission). A new detection algorithm is p...