This question evaluates a data scientist's competency in customer experience analytics, causal inference, experimentation design, and multi-table data analysis by requiring synthesis of complaints, resolutions, and service ratings to generate insights and prioritized actions.
You work on customer experience analytics for an e-commerce furniture retailer. You have 3 data tables related to customer complaints and how they were resolved.
complaintsEach row is a complaint/contact.
complaint_id
(string, PK)
customer_id
(string)
order_id
(string)
product_category
(string)
issue_type
(string; e.g., damaged item, late delivery, missing parts)
channel
(string; phone/chat/email)
created_at
(timestamp)
region
(string)
resolutionsOne row per complaint resolution attempt (may be multiple per complaint).
resolution_id
(string, PK)
complaint_id
(string, FK -> complaints)
resolution_type
(string; e.g., refund, replacement, coupon, escalate-to-carrier)
resolved_at
(timestamp)
sla_met
(boolean)
refund_amount
(numeric)
service_ratingsCustomer satisfaction after the resolution.
complaint_id
(string, FK -> complaints)
rating
(int 1–10; 10 = most satisfied)
rated_at
(timestamp)
You may assume standard data quality issues (missing ratings, multiple resolutions per complaint, backfilled timestamps) and should call out how you’d handle them.