Capital One Data Manipulation (SQL/Python) Interview Questions
Master your tech interview with our curated database of real questions from top companies.
Compute Customer Spend and Engineer Features for 2023
transactions +-----------+-------------+--------+------------+--------------+ | tran_id | customer_id | amount | tran_date | merchant_cat | +------...
Determine Country with Most 'Sunny' Days
Weather +------------+------------+---------+ | country | date | weather | +------------+------------+---------+ | Spain | 2023-07-01 | ...
Write SQL to compute campaign net revenue
Using the schema and sample data below, write SQL to produce, for each campaign_id and segment, the following metrics for August 2025: total_reached, ...
Merge CSVs and build revenue pivot with pandas
You receive four CSVs and must replicate an Excel VLOOKUP + PivotTable workflow using Python/pandas. CSV samples: customers.csv customer_id,signup_dat...
Find top category per region in Aug 2025
You are given the following schema and sample data. Schema: - customers(customer_id INT, name TEXT, region TEXT) - orders(order_id INT, customer_id IN...
Reconcile ledgers with SQL/Python and late events
You own a daily ETL + reconciliation job between two financial ledgers. Late postings (“delay time”) up to 48 hours are common. Schema: - payments_raw...
Write SQL for theme-park revenue and visits
You are given theme-park ticketing and visits data. Write SQL to answer the following, using the sample schema and tables below. Return both the query...
Identify country with highest sunny-day probability
Write SQL to find the country with the highest probability that a day is sunny. Use the schema and sample data below. Rules: consider a day sunny for ...
Merge four CSVs locally, robustly and efficiently
You receive four CSV files that must be merged locally on a laptop with 8 GB RAM, without relying on cloud services: - products.csv: product_id, categ...
Write SQL for lowest price with ratings
You have two tables. Schema: - products(product_id INT PRIMARY KEY, product_name TEXT, category TEXT) - purchase(purchase_id INT PRIMARY KEY, product_...
Write SQL to find top net-revenue products
Using the sample schema and data below, write a single SQL query that returns, for the last 7 days relative to today (use today = 2025-09-01, so the w...
Audit flight data quality from metadata
You’re given an airline on‑time dataset and a one‑page “Metadata” slide that claims: flight_date (string, local time), dep_time/arr_time (HHMM local),...
Impute missing values without leakage
Given a DataFrame df with columns: user_id, event_date (datetime), country (categorical), device_type (categorical), age (numeric), income (numeric), ...
Merge seven tables into one clean DataFrame
Using pandas only (no loops over rows), write a function build_facts(customers, orders, order_items, products, payments, shipments, refunds) -> pd.Dat...
Write SQL to quantify outage revenue loss
A database outage blocked premium membership registrations from 2025-01-01 to 2025-03-31 (inclusive). Members intend to start on intended_start_date, ...
Design a reproducible data pipeline for modeling
You receive raw clickstream events and a user table. Build a reproducible daily pipeline that outputs user-day features for modeling. It must be idemp...
Impute, join, and upsert using SQL and Python
Write both SQL and Python (pandas) to complete the following data-manipulation tasks. Assume today is 2025-09-01 for any time filters. Schema: custome...
Fix dash dates and aggregate watch time
You receive a CSV of ad viewing logs where the date column repeats only on the first row of each block and subsequent rows use a single dash '-' to in...
Write one SQL for exam scores aggregation
You are given an exam database. Write a single SQL statement (CTEs allowed; one final statement only) that satisfies all three requirements below. You...
Merge ad CSVs and compute CTR
Using SQL, clean and merge four CSVs and answer all parts exactly. Schema and sample rows (assume types: date is DATE, others INT/VARCHAR): platforms(...