Correlation-Focused Analysis: Outreach Channels vs. Deal Win Rate
You support a sales team and are asked to find which outreach channels correlate with higher deal win rate, without building predictive models. You have two datasets:
-
Deals: deal_id, account_id, rep_id, created_at, closed_at, is_won, amount_usd, product_line, region
-
Touches: account_id, rep_id, touch_date, channel (email/call/demo/webinar), is_primary_contact
Assume you have a frozen data snapshot date T0 (the last day touches and deals are observed). Design a decision-ready, correlation-focused analysis that avoids causal claims:
(a) Define a defensible exposure window (e.g., touches within the first 14 days after created_at) and justify how you’ll handle right-censoring for open deals and late touches.
(b) Specify stratifications and/or matching (e.g., region, segment, deal size buckets, rep tenure) to control confounding without modeling.
(c) Show exactly how you’d compute within-rep, within-segment correlations to avoid between-rep composition bias. Outline a de-meaning or fixed-effects-style differencing before correlating.
(d) List bias risks (reverse causality when hot deals drive more touches, missing-not-at-random touches on lost deals, seasonality) and propose sensitivity checks (pre-registration of windows, placebo windows before deal creation, leave-one-rep-out analysis, randomization inference) to assess robustness.
(e) Describe two plots that can reveal Simpson’s paradox across regions or segments and how you’d detect and communicate it.
(f) Write the exact decision guardrails you’ll present to sales leadership to prevent causal overreach and how you’d phrase them.