Data Scientist Data Manipulation (SQL/Python) Interview Questions
Practice 530 real Data Manipulation (SQL/Python) interview questions for Data Scientist roles. From companies including Meta, Amazon, TikTok, Capital One, DoorDash.

"I got asked a hardcore MCM DP question and I saw it on PracHub as well. Solved that question in 5 minutes. Without PracHub I doubt I could solve it in 5 hours. Though somehow didn't get hired, perhaps I guess I solved it too fast? /s"

"Believe me i'm a student here jn US. Recently interviewed for MSFT. They asked me exact question from PracHub. I saw it the night before and ignored it cause why waste time on random sites. I legit wanna go back and redo this whole thing if I had chance. Not saying will work for everyone but there is certainly some merit to that website. And i'm gonna use it in future prep from now on like lc tagged"

"10 years of experience but never worked at a top company. PracHub's senior-level questions helped me break into FAANG at 35. Age is just a number."

"I was skeptical about the 'real questions' claim, so I put it to the test. I searched for the exact question I got grilled on at my last Meta onsite... and it was right there. Word for word."

"Got a Google recruiter call on Monday, interview on Friday. Crammed PracHub for 4 days. Passed every round. This platform is a miracle worker."

"I've used LC, Glassdoor, and random Discords. Nothing comes close to the accuracy here. The questions are actually current — that's what got me. Felt like I had a cheat sheet during the interview."

"The solution quality is insane. It covers approach, edge cases, time complexity, follow-ups. Nothing else comes close."

"Legit the only resource you need. TC went from 180k -> 350k. Just memorize the top 50 for your target company and you're golden."

"PracHub Premium for one month cost me the price of two coffees a week. It landed me a $280K+ starting offer."

"Literally just signed a $600k offer. I only had 2 weeks to prep, so I focused entirely on the company-tagged lists here. If you're targeting L5+, don't overthink it."

"Coaches and bootcamp prep courses cost around $200-300 but PracHub Premium is actually less than a Netflix subscription. And it landed me a $178K offer."

"I honestly don't know how you guys gather so many real interview questions. It's almost scary. I walked into my Amazon loop and recognized 3 out of 4 problems from your database."

"Discovered PracHub 10 days before my interview. By day 5, I stopped being nervous. By interview day, I was actually excited to show what I knew."

"I recently cleared Uber interviews (strong hire in the design round) and all the questions were present in prachub."
"The search is what sold me. I typed in a really niche DP problem I got asked last year and it actually came up, full breakdown and everything. These guys are clearly updating it constantly."
Write SQL for cuisine median delivery times
Use SQL to answer the following. Assume ANSI SQL with window functions and percentile functions available. Treat “today” as 2025-09-01 (inclusive). Co...
Detect sessions and gaps using SQL LEAD
Write a single ANSI-SQL query that (a) assigns per-user session_ids when the gap between consecutive events exceeds 30 minutes, (b) computes session_s...
Process real-time enter/exit events and actives
You receive a real-time stream of events with schema: user_id (str), channel (str), event_type ("enter"|"exit"), ts (UTC ISO timestamp). A user can ‘e...
Convert Dictionary to DataFrame
Using Python and pandas, convert the following dictionary into a DataFrame. Each top-level key is a target column name, and each value is a list of [r...
Implement robust word counts and min/max
You receive a 50GB UTF-8 text corpus on disk. Implement a Python solution that:\n- Streams the file without loading it fully into memory.\n- Counts ca...
Write SQL for DAU and first-purchase conversion
Today is 2025-09-01. Using the schema and sample data below, write a single ANSI-SQL query that returns one row per day for the last 7 days (2025-08-2...
Compute violation rate and flag precision in SQL
You are analyzing a Trust & Safety product in BigQuery. Assume 'today' is 2025-09-01 (UTC). Define precise metrics and write SQL to compute them, bein...
Handle repeated churn in SQL
As part of analyzing the same promotion experiment, you need SQL that handles users who churn and later resubscribe. Assume the following tables: 1. e...
Find most co‑purchased product pairs in SQL
Given the schema and sample data below, write ANSI-SQL to return the top 5 unordered product pairs most frequently purchased together across distinct ...
Analyze time-zoned events with pandas
You are given two pandas DataFrames. events columns: user_id:int, ts:str ISO-8601 with timezone (e.g., '2025-08-31T23:58:43-07:00'), event:str in {'si...
Calculate Cohort Retention
You are given two tables: users - user_id BIGINT PRIMARY KEY - signup_ts TIMESTAMP user_events - user_id BIGINT - event_ts TIMESTAMP - event_name VARC...
Transform clickstream with pandas sessionization
Given a pandas DataFrame events with columns [user_id:int, ts:str ISO8601 or NaT, url:str, server_log_ts:datetime], build 30-minute inactivity session...
Compute cohort GMV and payer rate with edge cases
You are given the following schema (timestamps are UTC): users(user_id INT, country STRING, created_at TIMESTAMP) events(user_id INT, event_ts TIMESTA...
Compute video-call SQL metrics with edge cases
Use 'today' = 2025-09-01. Assume UTC timestamps. Write SQL to answer both parts below and call out how your queries handle edge cases (duplicates, fai...
Write SQL for seller and category metrics
Assume the following marketplace tables. Table: listing_interactions - buyer_id STRING - seller_id STRING - product_id STRING - interaction_date DATE ...
Write SQL for reply-based recipient metrics
You work on a social product and are given two tables. Assumptions (use these unless you state otherwise): - All timestamps are in UTC. - A “reply” is...
Write Queries for Pinterest Engagement Tasks
You are given several data-manipulation tasks based on Pinterest-style product data. Use UTC for all timestamp-to-date conversions unless stated other...
Identify country with highest sunny-day probability
Write SQL to find the country with the highest probability that a day is sunny. Use the schema and sample data below. Rules: consider a day sunny for ...
Design a scalable video platform database
Design the relational database for a YouTube-like video company. Deliverables: 1) list the core tables with key columns, types, and constraints (users...
Calculate valid daily usage with gap constraints
Write Standard SQL to compute, for a given date (use 2025-09-01), each user's total valid usage minutes. Schema and rules: Schema (timestamps are UTC)...