Transform DataFrame and compute diff-in-diff

Q: Transform DataFrame and compute diff-in-diff

This question evaluates proficiency in data cleaning (type conversion), missing-data handling, group-period aggregation, and estimating treatment effects via difference-in-differences.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

You are given a pandas DataFrame df with the following columns:

unit_id (string): entity identifier (e.g., user, city, driver)
group (string): either 'treatment' or 'control'
period (string): either 'pre' or 'post'
y (string): outcome stored as a string (should be numeric), with exactly one missing value (NaN)

Tasks:

Convert y from string to integer (assume all non-missing values are valid integer strings, e.g. '12' ).
Impute the missing value in y using the simple (unconditional) average of the non-missing y values.
After steps (1)–(2), compute the difference-in-differences (DiD) estimate of the treatment effect on y :

$\text{DiD} = (\overline{y}_{\text{treat, post}} - \overline{y}_{\text{treat, pre}}) - (\overline{y}_{\text{ctrl, post}} - \overline{y}_{\text{ctrl, pre}})$

Return the scalar DiD estimate (and optionally the intermediate group-period means used).

Transform DataFrame and compute diff-in-diff

Overview

Comments (0)