This question evaluates a data scientist's competency in causal inference, experimental and quasi-experimental design, incremental profit measurement from transaction-level data, statistical power and sample-size estimation, and validation and guardrail metric planning.

A national grocery chain launched a free loyalty card on January 1. You have 18 months of household-level transactions (item price, retailer cost, product, store, timestamp), enrollment dates, coupon redemptions (incl. funding source if available), and per-household acquisition and servicing costs. Your task is to estimate the 6‑month incremental profit attributable to enrollment and design a rigorous causal evaluation.
Assume:
(a) Define the causal estimand (incremental profit per enrolled household over 6 months) and provide the exact profit formula with all components: incremental gross margin, discount cost/cannibalization on baseline spend, coupon funding, acquisition cost, servicing cost, fraud/breakage.
(b) Propose the primary identification strategy (e.g., randomized holdout vs. observational Difference‑in‑Differences with matched controls). Write the DiD specification (outcome, treatment, fixed effects), and list the assumptions you will test (parallel trends, composition stability, seasonality, event timing).
(c) Specify guardrail metrics (e.g., margin rate, substitution, unit economics), and how you would detect and mitigate selection bias (eligibility rules, IVs/propensity methods).
(d) Power/sample size: state your MDE (profit per household) and derive the inputs needed (variance of margin dollars, enrollment rate, intraclass correlation), explaining how you would estimate them from the data.
(e) Validation: design an A/A test and a pre‑launch placebo DiD. Be precise about time windows, cohorts, and the decision rule to ship or roll back.
Login required