Compute survey rates and bias-correct ratings

Q: Compute survey rates and bias-correct ratings

This question evaluates proficiency in data manipulation and analytics, specifically SQL window functions, joins, time-windowed de-duplication, aggregation of response and completion rates, and statistical reweighting to correct non-response bias.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

Today is 2025-09-01. Use the schema and sample data below to answer A and B with SQL (standard SQL; you may use CTEs and window functions). Assume timestamps are UTC and exclude test users. Schema:

users(user_id INT, country STRING, is_test BOOLEAN)
survey_invites(invite_id INT, user_id INT, survey_id INT, sent_at TIMESTAMP)
survey_responses(response_id INT, user_id INT, survey_id INT, answered_at TIMESTAMP, is_complete BOOLEAN, rating INT)

survey_invites invite_id | user_id | survey_id | sent_at 100 | 1 | 10 | 2025-08-26 10:00:00 101 | 2 | 10 | 2025-08-27 09:00:00 102 | 3 | 10 | 2025-08-28 12:00:00 103 | 1 | 11 | 2025-08-20 08:00:00 104 | 5 | 10 | 2025-08-29 14:00:00 105 | 4 | 10 | 2025-08-28 15:00:00 106 | 2 | 11 | 2025-08-30 11:00:00 107 | 2 | 10 | 2025-08-27 10:00:00

survey_responses response_id | user_id | survey_id | answered_at | is_complete | rating 200 | 1 | 10 | 2025-08-26 10:05:00 | true | 4 201 | 2 | 10 | 2025-08-30 09:20:00 | false | 3 202 | 2 | 10 | 2025-09-02 08:00:00 | true | 5 203 | 3 | 10 | 2025-08-28 13:00:00 | true | 2 204 | 5 | 10 | 2025-09-01 15:00:00 | true | 4 205 | 2 | 11 | 2025-08-30 12:00:00 | true | 5

A) For invites sent in the last 7 days (2025-08-25 through 2025-09-01 inclusive), compute per survey_id: response_rate and completion_rate. Rules: (i) Only non-test users; (ii) If a user receives multiple invites to the same survey, use the earliest invite in-window; (iii) Count a user as 'responded' if they have any response within 7 days after their earliest in-window invite and with answered_at <= 2025-09-01 23:59:59; (iv) 'Completed' requires is_complete = true under the same window. Return survey_id, invited_users, responded_users, completed_users, response_rate, completion_rate.

B) For survey_id = 10 over the same 7-day invite window, compute a country-reweighted average rating that matches the invite country distribution to correct non-response bias. Use only complete responses within 7 days after the user's earliest in-window invite and with answered_at <= 2025-09-01 23:59:59. Let w_country = (invite_share_country) / (response_share_country). Output weighted_avg_rating rounded to 3 decimals and a breakdown table with country, invites, responses, invite_share, response_share, weight, and weighted contribution.

Compute survey rates and bias-correct ratings

Overview

Comments (0)