You’re given a 6‑hour take‑home to “predict a product’s target users,” but you estimate a thorough solution needs 24+ hours. Produce a written plan (≤1 page) that you would send to the hiring manager and business partner before starting. Include: (1) crisp problem statement, success criteria, and explicit non‑goals; (2) a prioritized work breakdown with specific deliverables you will complete in 6 hours and what you will defer; (3) the minimum data you need and the clarifying questions you would ask up front; (4) risks (e.g., label ambiguity, leakage, class imbalance) and your mitigations; (5) the exact storyboard of your presentation (titles of slides and key figures); and (6) how you will defend trade‑offs in Q&A if challenged on not exploring an attractive but out‑of‑scope idea (e.g., feature learning or causal inference). Be specific about how you will demonstrate impact with limited time, and how you will measure whether your chosen scope was correct after the fact.
Quick Answer: This question evaluates a candidate's competency in scoping and prioritizing a data‑science take‑home, stakeholder communication, risk identification, and trade‑off defense under strict time constraints.
Solution
# 6‑Hour Plan: Predict Product Target Users (for pre‑alignment)
1) Problem statement, success criteria, non‑goals
- Problem: Predict which accounts are likely to become “target users” of Product P in the next 90 days to prioritize GTM outreach. Label = account activates Product P by meeting a clear threshold (e.g., feature flag ON + 3 successful events) within 90 days of a reference date.
- Objective: Produce a baseline, interpretable model and a top‑K ranked list with measurable lift over a simple heuristic.
- Success criteria (6‑hr scope):
- Offline: AUC ≥ 0.70 and Lift@Top10% ≥ 2.0 over a naive baseline (e.g., recent activity rank). Also report PR‑AUC and calibration.
- Business proxy: For Top 1,000 accounts, Precision@K ≥ 3× random; estimated incremental conversions and revenue using historical base rates.
- Deliverables: Notebook, 8–10 slide deck, top‑K list with rationale, risk checks, backlog.
- Non‑goals (explicitly out of scope in 6 hours): Deep feature learning, causal inference/uplift modeling, extensive hyperparameter tuning, productionization, AB test launch, complex segmentation research.
2) Prioritized work breakdown (6 hours) and deferments
- 0:00–0:30 Align + data access check
- Deliverable: 5‑line written alignment on label, horizon, action owner, top‑K size, baseline comparator.
- 0:30–1:30 Data audit + label construction
- Deliverable: Reproducible labeling logic; temporal split (train: older periods; test: most recent window).
- 1:30–3:00 Baseline model(s)
- Approach: Logistic regression and gradient‑boosted trees (shallow). Features: recency/frequency, account age, size, prior product engagements, vertical, simple aggregates over 7/30/90 days. Class weighting for imbalance.
- Deliverable: Trained models; cross‑validated metrics; guardrail checks.
- 3:00–4:00 Evaluation + interpretability
- Deliverable: ROC/PR curves, Lift@K, calibration plot, top feature importances/coefficients, partial dependence on top drivers.
- 4:00–5:00 Business translation
- Deliverable: Top‑K list with expected positives = p̂ × K; revenue back‑of‑envelope; comparison to heuristic; sensitivity by segment.
- 5:00–6:00 Slides + handoff
- Deliverable: Slide deck, risks/mitigations, next‑step backlog with ROI; reproducibility notes.
- Defer to 24+ hours: Rich feature engineering (embeddings, sequences), hyperparameter sweeps, leakage‑robust feature store, uplift/causal analysis, production scoring pipeline, monitoring, fairness deep‑dive, multi‑model ensembling, segmentation discovery.
3) Minimum data + clarifying questions
- Minimum data
- Accounts: id, create_date, segment/vertical, size proxies, region.
- Product P events/flags: activation flag with timestamp, usage counts by day, key milestones.
- Activity: core platform events aggregated by day (7/30/90‑day windows).
- Outcomes: target label within 90 days from reference date.
- Exclusions: indicators of future knowledge to avoid leakage (post‑label fields), marketing/sales touches with timestamps to optionally exclude.
- Period coverage: ≥ 12 months to allow temporal split and seasonality.
- Clarifying questions
- Define “target user”: exact activation threshold; edge cases (trial, partial enablement).
- Horizon and cadence: 90‑day OK? How often will scores be used? Who acts on them and how many accounts can they handle weekly?
- Baselines: What heuristic do you use today? What is current conversion rate and revenue per conversion?
- Constraints: Any legal/privacy limits? Any must‑include or must‑exclude fields?
- Success: Preferred decision threshold or list size? Any critical segments to highlight (e.g., enterprise vs SMB)?
4) Key risks and mitigations
- Label ambiguity: Co‑define activation threshold; run sensitivity on thresholds; document final choice.
- Leakage: Strict temporal split; drop features updated after reference date; exclude post‑contact marketing/sales touches.
- Class imbalance: Class weights or focal loss; evaluate PR‑AUC and Lift@K; calibrate probabilities (Platt/isotonic).
- Selection bias: Ensure negatives include both active and dormant accounts; compare segment‑wise metrics.
- Drift/seasonality: Use most recent window as test; report by month; plan for periodic retrain.
- ID issues: Deduplicate accounts; consistent ID mapping if multiple identifiers exist.
- Missingness: Simple imputation with missingness indicators; sanity checks on extreme values.
5) Presentation storyboard (titles + key figures)
- Slide 1: Problem, scope, and decision we enable (one‑pager)
- Slide 2: Definitions, label, and timeline (label schematic)
- Slide 3: Data and guardrails (temporal split diagram; leakage controls)
- Slide 4: Modeling approach (feature groups; simple model stack)
- Slide 5: Performance summary (ROC/PR; Lift@Top10%)
- Slide 6: Business impact (Top‑K precision; expected wins; revenue estimate)
- Slide 7: What drives predictions (top features; partial dependence)
- Slide 8: Segment sensitivity and calibration (by SMB/ENT; calibration curve)
- Slide 9: Risks, mitigations, and monitoring plan
- Slide 10: Next steps and ROI of deeper work (backlog with estimated gains)
6) Trade‑off defense and post‑hoc validation
- Defense framework: Maximize decision readiness in 6 hours. Chose interpretable baseline to deliver immediately usable Top‑K with quantified lift and risk controls. Out‑of‑scope ideas (feature learning, causal uplift) move to backlog with explicit triggers (e.g., if Lift@10% < 1.5 or opportunity ≥ $X/week).
- If challenged on feature learning: Show that current Lift@K already 2–3× baseline; marginal value of complex features is uncertain within 6 hours and risks leakage; propose 24‑hour follow‑up experiment with pre‑registered success criteria.
- If challenged on causal inference: Clarify that targeting allocation is currently predictive; causal validation requires experiment design and instrumentation; propose A/B holdout with power calc and success metric (incremental activations per contact).
- Demonstrating impact now: Provide Top‑1,000 list, predicted probabilities, and expected conversions; compare to legacy heuristic on the same test window; estimate revenue using historical value per activation; include a short pilot plan (shadow scoring or small outreach test).
- Measuring if scope was correct after the fact:
- Compare realized conversions of Top‑K vs heuristic over the next cycle.
- Track calibration and stability across months/segments.
- Compute ROI = (incremental conversions × value) − hours spent; reassess backlog if ROI gap suggests more complexity warranted.
Appendix: Metric definitions
- Lift@K = Precision@K ÷ base rate. Example: if base rate = 5% and Precision@10% = 15%, Lift@10% = 3.0.
- PR‑AUC emphasizes performance under imbalance; calibration ensures p̂ aligns with observed rates for thresholding and headcount planning.