Other Data Scientist Interview Questions
Master your tech interview with our curated database of real questions from top companies.
Derive and regularize logistic regression
Churn Propensity with Logistic Regression: Theory, Validation, and Decisions Context: You are building a churn propensity model (y ∈ {0,1}) using logi...
Design metrics resilient to data quality
Design a robust metric and compute it using only window functions (no JOINs) to show how data-quality issues change conclusions. Schema: payments_raw(...
Handle unprofessional, prescriptive interviewers
You are mid-interview on a time-boxed SQL screen where the interviewer forbids JOINs and insists on window functions only, interrupts with a personal ...
Solve window-function SQL without joins
You must use only window functions (no JOINs). CTEs are allowed. Given the schemas and tiny samples below, write SQL for each sub-question and explain...
Build SQL pivot with lookups and currency conversion
You are given the following schema and sample data. Use SQL (or Python with SQL-like transforms) to answer the tasks below. Treat amounts as gross rev...
Defend fit and handle pressure in finance interview
A panel of three senior managers challenges you: "Your background is data science, not finance. Why are you the right hire for a Finance Analyst role ...
Implement multiplication without using the multiplication operator
Implement int multiply(int a, int b) without using * or /. You may use +, −, bitwise operators, and shifts. Requirements: - Handle negatives, zero, an...
Design anomaly detection and handle imbalanced logistic regression
You receive a time‑stamped transactions dataset: columns [event_time (UTC), customer_id, merchant_id, amount, country, device_type, features...], labe...
Explain motivations, customer ownership, mentoring, and culture fit
1) Why are you leaving your current company? Answer in <90 seconds, avoid negativity, and tie your reason to specific growth you seek (e.g., owning an...
Write SQL to analyze response accuracy and speed
You are given response-level data for an online assessment with sections verbal/design/analytics and verbal subtypes grammar/vocab/tense/other. Using ...
Design and power an A/B on question mix
Experiment Design: Replacing 10% "Other" Verbal With Grammar in a 15-Min, 19-Question Section You need to test whether replacing the 10% "other" verba...
Compute counts and pacing for verbal section
Verbal Section Allocation and Time Optimization You are designing a 15-minute verbal section (900 seconds total) with 19 questions across four subtype...
Demonstrate behavioral problem-solving with STAR
Data Scientist Onsite — Behavioral & Leadership (Use STAR) Answer concisely using the STAR framework (Situation, Task, Action, Result). Prepare brief,...
Design MapReduce and Spark jobs
Big data systems: (a) Explain Hadoop’s fault tolerance (HDFS replication, task re-execution) and why MapReduce includes shuffling and sorting; in a wo...
Manipulate data efficiently in Python
Answer the following: (a) Contrast list comprehensions and generators with respect to memory and evaluation; write a generator that yields rolling win...
Query conversion and retention with SQL windows
Schema and sample data (PostgreSQL): users(id, signup_date, country) 1 | 2025-08-20 | US 2 | 2025-08-25 | US 3 | 2025-08-27 | CA 4 | 2025-08-30 | US 5...
Prove reservoir sampling correctness
Design an algorithm to sample k items uniformly at random from a stream of unknown and potentially massive length N, using O(k) memory and one pass. (...
Write mini-batch gradient descent
Implement a generic mini-batch gradient descent routine: inputs are differentiable loss L(θ; x), initial θ0, batch size b, steps T, and learning-rate ...
Implement KNN from scratch
Without using ML libraries, implement k-Nearest Neighbors for classification. Requirements: (a) Support Euclidean and cosine distances; (b) Allow tie-...
Extract companies from noisy text
Extracting Company Names from Noisy Resumes and Web Snippets Context You receive messy resume text (PDF-to-text/OCR, varying casing) and scraped web s...