Amazon is considering adding live broadcasts of selected sports events to Prime Video. Using observational data, estimate the causal impact on Prime membership subscriptions and engagement. Precisely specify: (1) treatment and control at the user level (e.g., users exposed to/actually watching live sports vs not), (2) primary outcomes (e.g., subscription starts, retention, upgrades, watch‑time), and (3) key covariates for selection on observables. Start with matching: choose a method (k‑NN, caliper, Mahalanobis, or propensity‑score matching), define distance or the propensity model class, explain overfitting controls (regularization, cross‑fitting), and provide balance diagnostics you will require (SMD thresholds, variance ratios, overlap plots). State assumptions (unconfoundedness, overlap, SUTVA) and how you will test/justify them. Then propose an instrument to address unobservables via randomized variation in promotion prominence for the live stream (e.g., hero banner vs standard tile). Write the 2SLS explicitly: First stage Z→LiveWatch with controls and fixed effects; second stage LiveWatch_hat→Outcome with the same controls. Discuss IV validity (relevance, exclusion, independence, monotonicity), weak‑IV checks (first‑stage F), and over‑identification tests if multiple instruments. Finally, outline a DID/event‑study alternative using staggered rollout, detail the regression, fixed effects, and heterogeneity, and list key threats (interference, spillovers, time‑varying confounding) with mitigation.

This question evaluates a data scientist's competence in causal inference and observational study design within analytics and experimentation, covering skills such as defining treatment and control, selecting outcome metrics and covariates, and choosing identification strategies like matching, IV, and difference-in-differences; it is commonly asked to assess the ability to produce credible causal estimates when randomized experiments are unavailable. It tests both conceptual understanding of identification assumptions (e.g., unconfoundedness, overlap, SUTVA, IV validity) and practical application of matching algorithms, propensity modeling, regression specifications, diagnostics, and staggered-rollout/event-study designs for real-world program evaluation.

How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

What difficulty level is this interview question?

This is a hard difficulty Analytics & Experimentation question, commonly asked during Onsite rounds at Amazon.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Amazon during technical interviews.

Estimate live sports impact on subscriptions

Estimate the Causal Impact of Live Sports on Prime Subscriptions and Engagement

Context

Amazon is considering adding live broadcasts of selected sports events to Prime Video. Using observational data, design an analysis to estimate the causal impact on Prime membership subscriptions and user engagement.

Tasks

Define treatment and control at the user level
- Specify both an exposure-based definition and an uptake-based definition (e.g., users exposed to or actually watching live sports vs not).
Define primary outcome metrics
- Examples: subscription starts, retention, upgrades, watch-time.
List key covariates to justify selection on observables.
Matching design
- Choose a method (k-NN, caliper, Mahalanobis, or propensity-score matching).
- Define the distance metric or the propensity model class.
- Explain overfitting controls (regularization, cross-fitting).
- Specify balance diagnostics you require (SMD thresholds, variance ratios, overlap plots).
Assumptions
- State unconfoundedness, overlap, SUTVA, and how you will test or justify them.
Instrumental variables
- Propose an instrument based on randomized variation in promotion prominence for the live stream (e.g., hero banner vs standard tile).
- Write the 2SLS explicitly: First stage Z → LiveWatch with controls and fixed effects; second stage LiveWatch_hat → Outcome with the same controls.
- Discuss IV validity: relevance, exclusion, independence, monotonicity; weak-IV checks (first-stage F); and over-identification tests if multiple instruments.
DID and event study alternative
- Outline a staggered rollout design.
- Provide the regression specification, fixed effects, heterogeneity, and key threats (interference, spillovers, time-varying confounding) with mitigations.

Estimate the Causal Impact of Live Sports on Prime Subscriptions and Engagement

Context

Tasks

Define treatment and control at the user level
- Specify both an exposure-based definition and an uptake-based definition (e.g., users exposed to or actually watching live sports vs not).
Define primary outcome metrics
- Examples: subscription starts, retention, upgrades, watch-time.
List key covariates to justify selection on observables.
Matching design
- Choose a method (k-NN, caliper, Mahalanobis, or propensity-score matching).
- Define the distance metric or the propensity model class.
- Explain overfitting controls (regularization, cross-fitting).
- Specify balance diagnostics you require (SMD thresholds, variance ratios, overlap plots).
Assumptions
- State unconfoundedness, overlap, SUTVA, and how you will test or justify them.
Instrumental variables
- Propose an instrument based on randomized variation in promotion prominence for the live stream (e.g., hero banner vs standard tile).
- Write the 2SLS explicitly: First stage Z → LiveWatch with controls and fixed effects; second stage LiveWatch_hat → Outcome with the same controls.
- Discuss IV validity: relevance, exclusion, independence, monotonicity; weak-IV checks (first-stage F); and over-identification tests if multiple instruments.
DID and event study alternative
- Outline a staggered rollout design.
- Provide the regression specification, fixed effects, heterogeneity, and key threats (interference, spillovers, time-varying confounding) with mitigations.

Estimate live sports impact on subscriptions

Quick Overview

Estimate live sports impact on subscriptions

Estimate the Causal Impact of Live Sports on Prime Subscriptions and Engagement

Context

Tasks

Write your answer

Estimate live sports impact on subscriptions

Quick Overview

Estimate live sports impact on subscriptions

Estimate the Causal Impact of Live Sports on Prime Subscriptions and Engagement

Context

Tasks

Write your answer