How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

What difficulty level is this interview question?

This is a easy difficulty Data Manipulation (SQL/Python) question, commonly asked during Take-home Project rounds at Wayfair.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Wayfair during technical interviews.

Solve SQL and Python OA tasks | Wayfair Interview Question

Q: Solve SQL and Python OA tasks

This question evaluates SQL and pandas data-manipulation skills, including joined aggregations over date windows, handling of missing values with median imputation, and multi-criteria sorting and tie-breaking.

Complete the following two online-assessment tasks.

Task A — SQL

You are given two tables:

customer

customer_id INT
customer_name VARCHAR
purchase_id VARCHAR

purchase

id VARCHAR
purchase_price DECIMAL(10,2)
purchase_date DATE

customer.purchase_id joins to purchase.id.

Based on the sample in the original prompt, interpret "the earliest 10 years" as the historical 10-year window that starts on the minimum purchase_date in the purchase table and ends just before MIN(purchase_date) + INTERVAL '10 years'. Assume purchase_date is stored as a DATE in UTC.

Write a SQL query to return the customer(s) whose purchase_price is the highest within that earliest 10-year window. If multiple customers tie for the highest qualifying price, return all of them.

Required output columns:

customer_name
highest_purchase_price

Task B — Python

You are given a pandas DataFrame students with schema:

student_id INT
math_score FLOAT NULL
english_score FLOAT NULL
physics_score FLOAT NULL

Write a Python function that returns the top students after the following processing steps:

Remove any student who is missing scores in at least two subjects.
For the remaining students, fill each missing subject score with the median of that subject, computed over the remaining non-null rows.
Sort the cleaned table by math_score descending, then physics_score descending. If additional tie-breaking is needed, sort by student_id ascending.
Return the top 5 rows. If fewer than 5 students remain, return all remaining rows.

Return the full cleaned rows for the selected students.

Complete the following two online-assessment tasks.

Task A — SQL

You are given two tables:

customer

customer_id INT
customer_name VARCHAR
purchase_id VARCHAR

purchase

id VARCHAR
purchase_price DECIMAL(10,2)
purchase_date DATE

customer.purchase_id joins to purchase.id.

Write a SQL query to return the customer(s) whose purchase_price is the highest within that earliest 10-year window. If multiple customers tie for the highest qualifying price, return all of them.

Required output columns:

customer_name
highest_purchase_price

Task B — Python

You are given a pandas DataFrame students with schema:

student_id INT
math_score FLOAT NULL
english_score FLOAT NULL
physics_score FLOAT NULL

Write a Python function that returns the top students after the following processing steps:

Remove any student who is missing scores in at least two subjects.
For the remaining students, fill each missing subject score with the median of that subject, computed over the remaining non-null rows.
Sort the cleaned table by math_score descending, then physics_score descending. If additional tie-breaking is needed, sort by student_id ascending.
Return the top 5 rows. If fewer than 5 students remain, return all remaining rows.

Return the full cleaned rows for the selected students.

Solve SQL and Python OA tasks

Quick Overview

Task A — SQL

Task B — Python

Submit Your Answer

Solve SQL and Python OA tasks

Quick Overview

Task A — SQL

Task B — Python

Submit Your Answer