Estimate delayed CVR nonparametrically with censored data

Q: Estimate delayed CVR nonparametrically with censored data

This question evaluates competence in handling right-censored time-to-event data, nonparametric estimation and inference for delayed conversions, construction of confidence intervals and distribution-free bounds, diagnostic checks for nonstationarity, and reasoning about identifiability under aggregated data.

Q: How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

Question

Today is 2025-09-01. We need the 14-day conversion rate (CVR14) for impressions served between 2025-08-18 and 2025-09-01, but many conversions occur with unknown delays up to 14 days, so recent impressions are right-censored. You cannot assume any parametric delay distribution. Tasks:

Propose a nonparametric estimator for CVR14 that uses historical cohorts to learn the time-to-convert survival function and applies it to the current, partially observed cohort (e.g., Kaplan–Meier for conversion delay with right-censoring, then inverse-probability weighting to debias the observed-to-date converts). Write formulas for the estimator and indicate the data each term uses.
Construct a 95% confidence interval using Greenwood’s formula for the KM variance and the delta method for the transformed CVR, stating assumptions. Explain how you would widen intervals if you suspect non-stationarity of delays.
Provide a distribution-free conservative bound for CVR14 that makes minimal assumptions (e.g., DKW inequality on the empirical CDF of delays or Clopper–Pearson on observed conversions plus a worst-case bound for yet-unfinished impressions). Show how to compute it from raw counts available today.
Describe diagnostics to check whether the historical delay distribution is applicable now (e.g., compare covariate-shift via PSI/KS tests on traffic mix, day-of-week effects, or device splits) and how to stratify/weight if shift is detected.
If you can observe only aggregated daily counts of impressions and same-day conversions (no user-level data), outline an identifiable approach and the additional assumptions required to estimate or bound CVR14.

Estimate delayed CVR nonparametrically with censored data

Quick Overview

Comments (0)