Design a lead-scoring model

Q: Design a lead-scoring model

This question evaluates skills in predictive modeling, feature engineering and selection, handling data leakage and multicollinearity, model interpretability, metric selection and diagnostics, and production deployment and monitoring within a marketing/growth data science role.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Context

You are interviewing for a Data Scientist role on a marketing/growth team. The business wants lead scoring: ranking or scoring incoming leads so Sales/Marketing can prioritize outreach.

Data

Assume you have a historical dataset of leads with:

lead_id (string/int)
created_at (timestamp)
Features available at scoring time, e.g.
- acquisition channel, campaign, geography, device
- firmographics (company size, industry)
- behavioral signals (pages viewed, demo request, email opens)
- any other engineered features available at lead creation time
Outcome label(s), e.g.
- converted (boolean): whether the lead converted within a defined window
- time_to_convert_days (numeric, optional)

Task

Propose an end-to-end approach to build a statistical model and a machine learning model for lead scoring.
Discuss what kinds of variables/features you would use and how you would handle feature availability and leakage.
The stakeholder may either:
- only care about predictive performance, or
- require understanding which features are important and why. Explain what you would deliver in each scenario.
Explain what multicollinearity is, why it matters (or doesn’t) for different model families, how you would detect it, and how you would mitigate it.
Define how you would evaluate the model, including:
- a primary metric (and why)
- diagnostic metrics/plots
- guardrails (fairness, stability, or operational constraints)
Describe how you would deploy and monitor the lead score in production and how you would update it over time.

Be explicit about assumptions (conversion window, label definition, scoring cadence) and call out key pitfalls/edge cases.

Design a lead-scoring model

Quick Overview

Context

Data

Task

Solution

Comments (0)