Design a customer LTV prediction system

Q: Design a customer LTV prediction system

This question evaluates end-to-end ML system design competencies, including business-driven label definition, feature engineering and point-in-time correctness, cold-start strategies, modeling and uncertainty estimation, temporal training/validation, evaluation metrics, and production serving and monitoring within the domain of machine learning system design and data engineering. It is commonly asked to assess an engineer's ability to translate business LTV requirements into robust, production-ready ML solutions that handle censoring, non-stationarity, and operational constraints, testing both conceptual understanding of trade-offs and practical application of engineering patterns for training, validation, and serving.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

System Design: End-to-End ML for Customer Lifetime Value (LTV)

Context

You are designing an end-to-end machine learning system to estimate customer lifetime value (LTV) for a large two-sided marketplace platform. Assume we are focusing on the demand side (guest/customer LTV) unless you prefer to discuss both sides; state your scope explicitly.

Requirements

Define and design the full stack from business definition and labels through modeling, evaluation, and serving. Cover the following:

Business Definition

Precisely define LTV for this business (e.g., revenue, gross margin, contribution after variable costs). Specify which costs are included/excluded.
Specify the prediction horizon (e.g., 6, 12, or 24 months) and whether to discount future cash flows. State the discount rate if used.
Clarify scope (e.g., guest LTV only) and any exclusions (e.g., fraudulent activity, chargebacks).

Data and Features

Enumerate data sources: bookings/transactions, cancellations/refunds, payments/fees, marketing touchpoints, user profiles/consents, search/browse events, messaging/funnel, support interactions, risk decisions, incentives, and cost tables.
Describe feature pipelines: aggregation windows (e.g., 7/30/90/365 days), RFM-style features, recency of activity, seasonality, geo/device, marketing channel, quality signals, and marketplace context (e.g., supply-demand).
Point-in-time correctness and leakage prevention (e.g., event-time joins, freeze windows). Identity resolution and PII handling.

Cold-Start Strategy

How to score new or nearly-new users (no bookings or very sparse history). Consider priors, hierarchical grouping, and context-based features.

Label Construction

Define the target formula precisely, including how to handle cancellations, refunds, incentives, and payment processing costs.
Discuss horizon alignment, censoring (users without full observation windows), and maturity/freeze windows for late-arriving data.

Modeling Approach

Propose and justify a modeling strategy (e.g., survival/retention modeling, purchase frequency and monetary value decomposition, count models, direct regression, or mixture).
Note uncertainty estimation and calibration if applicable.

Training/Validation

Specify temporal train/validation/test splits (rolling windows/backtesting). Address class/label imbalance and non-stationarity.

Evaluation Metrics

Include regression error (e.g., MAE/RMSE/sMAPE), ranking/segment metrics (e.g., decile lift, top-k capture), calibration, and business metrics (profit at policy).

Serving Architecture

Propose offline/online architecture for batch scoring and near-real-time updates.
Cover data freshness SLAs, snapshotting/backfills, point-in-time correctness, and monitoring/alerting (data quality, drift, performance, business KPIs).
If time is limited, you may skip detailed online serving.

Downstream Use Cases and Experimentation

Explain how scores feed decisions (e.g., marketing budget/CPA bidding, incentives, recommendations/ranking, CRM).
Outline experimentation to measure impact, including interference/marketplace considerations.

Risk, Bias, Privacy, and Compliance

Discuss how you would address model bias/fairness, privacy (consent, minimization, deletion), and regulatory requirements (e.g., GDPR/CCPA).

Design a customer LTV prediction system

System Design: End-to-End ML for Customer Lifetime Value (LTV)

Context

Requirements

Solution

Comments (0)

Design a customer LTV prediction system

Overview

System Design: End-to-End ML for Customer Lifetime Value (LTV)

Context

Requirements

Solution

Comments (0)