This question evaluates SQL and pandas data-manipulation skills, including joined aggregations over date windows, handling of missing values with median imputation, and multi-criteria sorting and tie-breaking.
Complete the following two online-assessment tasks.
You are given two tables:
customer
customer_id
INT
customer_name
VARCHAR
purchase_id
VARCHAR
purchase
id
VARCHAR
purchase_price
DECIMAL(10,2)
purchase_date
DATE
customer.purchase_id joins to purchase.id.
Based on the sample in the original prompt, interpret "the earliest 10 years" as the historical 10-year window that starts on the minimum purchase_date in the purchase table and ends just before MIN(purchase_date) + INTERVAL '10 years'. Assume purchase_date is stored as a DATE in UTC.
Write a SQL query to return the customer(s) whose purchase_price is the highest within that earliest 10-year window. If multiple customers tie for the highest qualifying price, return all of them.
Required output columns:
customer_name
highest_purchase_price
You are given a pandas DataFrame students with schema:
student_id
INT
math_score
FLOAT NULL
english_score
FLOAT NULL
physics_score
FLOAT NULL
Write a Python function that returns the top students after the following processing steps:
math_score
descending, then
physics_score
descending. If additional tie-breaking is needed, sort by
student_id
ascending.
Return the full cleaned rows for the selected students.