Data Scientist Interview Questions
Practice the exact questions companies are asking right now.
Compute ads revenue by geography in SQL
You have ad delivery logs for a shop-ads system. Tables ad_impressions - impression_id STRING (PK) - ts TIMESTAMP (UTC) - user_id STRING - shop_id STR...
How to evaluate similar-listing notifications feature
Case study (Marketplace product analytics) Context: Circle is a US marketplace app for buying and selling second‑hand products. On a product listing p...
Compute probability an account is fake
A platform uses an automated classifier to flag potentially fake accounts. Assume: - Base rate: 2% of accounts are fake. - The classifier flags a fake...
Evaluate new shop-ads ranking algorithm
You work on a marketplace with shop ads. A new ranking/recommendation algorithm is proposed to promote shop ads more aggressively, but stakeholders ar...
How predict vehicles’ turn direction at intersection?
At an intersection, there are N vehicles stopped or moving slowly. For each vehicle you have historical time-series data up to the current time: - Pos...
How to estimate feature impact on usage time
Problem A product team believes a new feature (or a variable you can influence, e.g., enabling notifications, new feed ranking, new UI) changes user t...
Debug and fix a PyTorch Transformer training loop
Minimal Causal LM Debugging and Optimization Context You are given a tiny causal decoder-only language model implemented in PyTorch. It appears to "tr...
Compute pirated-theme usage and revenue loss
You work on a theme marketplace. Some shops install pirated themes instead of paying for official themes. Assume all timestamps are in UTC. Tables sho...
Design and evaluate a fraud detection strategy
Context You are interviewing for a Fraud Data Scientist role at a payments company. The company has a fraud model and some operational constraints. Pa...
Handle conflict and urgent shifting priorities
Answer the following behavioral questions with concrete examples from your experience: 1. Describe a conflict you had with a partner or teammate. What...
Maximize profit from one stock trade
Problem You are given an integer array prices where prices[i] is the price of a stock on day i. You may complete at most one transaction: choose a day...
Determine earliest collision among moving cars
You are given n cars moving over time. Each car has known initial state at time \(t=0\): - 1D case (straight road): - initial position \(x_i\) (mete...
Retrieve First Active and Last Inactive Dates per User
Given a table activity that tracks user activities, write a SQL query to retrieve the first active date and last inactive date for each user. Table Sc...
Design features for house price prediction
Scenario You are building a model to predict house sale price from a tabular dataset (similar to typical real-estate datasets). The interviewer expect...
Compare two rare-event detection models statistically
You are evaluating two models (Model A and Model B) for rare-event detection (e.g., fraud, abuse, medical adverse event). Positives are extremely rare...
Design a lead-scoring model
Context You are interviewing for a Data Scientist role on a marketing/growth team. The business wants lead scoring: ranking or scoring incoming leads ...
Estimate ATE, ITT, and TOT from experiment
You are given a single dataset (CSV) from an A/B experiment on a streaming product. The goal is to estimate the causal effect of a personalization fea...
Explain KNN and how to tune it
K-Nearest Neighbors (KNN) fundamentals You are interviewing for a Data Scientist role. 1. Explain how the KNN algorithm works for both classification ...
Compute and plot a precision–recall curve
Precision–Recall (PR) curve coding / evaluation You are given a binary classifier’s outputs on a dataset: - y_true: array of true labels in \(\{0,1\}\...
Explain list vs tuple in Python
Question In Python: 1. What are the key differences between a list and a tuple? 2. When would you prefer using a tuple over a list? 3. What are the pe...