Compute weighted response rates by job category
Company: Thumbtack
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
You are given a CSV with one row per job posting and the following columns: job_id, job_category, invitations_sent (integer >= 0), provider_responses (integer >= 0), region, created_at (ISO date). Write pandas code to: (1) compute a per-job response_rate = provider_responses / invitations_sent, treating invitations_sent = 0 as missing and excluding those rows; (2) produce an invitation-weighted response rate by job_category with 95% Wilson score confidence intervals; (3) return the top 5 job_category values ranked by the weighted response rate, breaking ties by the lower bound of the CI; (4) robustly handle outliers where provider_responses > invitations_sent, negative values, or impossible dates by logging and dropping them; and (5) verify that the job-level weighted average equals the overall response rate computed from aggregated numerators/denominators (within 1e-9).
Quick Answer: This question evaluates a candidate's data manipulation and statistical estimation skills, including pandas-based aggregation, invitation-weighted rate calculation, confidence-interval computation, and robust handling of data-quality issues, in the Data Manipulation (SQL/Python) domain.