PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Apple

Construct a Churn-Prediction Pipeline Using Scikit-Learn

Last updated: Jun 25, 2026

Quick Overview

This question tests practical machine learning engineering skills, specifically the ability to construct an end-to-end classification pipeline for imbalanced tabular data. It evaluates knowledge of preprocessing, model validation, probability calibration, and production packaging — core competencies assessed in data scientist interviews.

  • medium
  • Apple
  • Machine Learning
  • Data Scientist

Construct a Churn-Prediction Pipeline Using Scikit-Learn

Company: Apple

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Technical Screen

##### Scenario Building a churn-prediction pipeline for a subscription business using scikit-learn. ##### Question Describe, step-by-step, how you would construct, train, validate, and evaluate a churn-prediction model in scikit-learn, including preprocessing, model choice, hyper-parameter tuning, and packaging the final pipeline for production. ##### Hints Mention Pipeline, ColumnTransformer, GridSearchCV, cross-validation, joblib.

Quick Answer: This question tests practical machine learning engineering skills, specifically the ability to construct an end-to-end classification pipeline for imbalanced tabular data. It evaluates knowledge of preprocessing, model validation, probability calibration, and production packaging — core competencies assessed in data scientist interviews.

Related Interview Questions

  • Implement Masked Multi-Head Self-Attention - Apple (easy)
  • Compare DCN v1 vs v2 and A/B test - Apple (medium)
  • Explain dataset size, generalization, and U-Net skips - Apple (medium)
  • Analyze vision model failures - Apple (medium)
  • Compare audio preprocessing and training - Apple (medium)
|Home/Machine Learning/Apple

Construct a Churn-Prediction Pipeline Using Scikit-Learn

Apple logo
Apple
Jul 12, 2025, 6:59 PM
mediumData ScientistTechnical ScreenMachine Learning
35
0

Construct a Churn-Prediction Pipeline in scikit-learn

Scenario

You are a data scientist on a subscription business. You need to build a model that predicts customer churn, defined as: will a currently-active customer cancel or go inactive within the next 30 days?

The training data is tabular with a mix of numeric and categorical features (e.g., tenure, monthly spend, plan type, region, support-ticket counts). The positive class (churners) is imbalanced — far fewer churners than non-churners. Some features change over time, and the company wants to act on the model's output by sending retention offers, so the cost of a false positive (a wasted offer) differs from the cost of a false negative (a lost customer).

Walk through, end to end, how you would construct, train, validate, evaluate, and ship this churn model in scikit-learn. Your answer should make concrete use of scikit-learn's Pipeline, ColumnTransformer, GridSearchCV, cross-validation utilities, and joblib. The problem is broken into parts below; treat them as one coherent pipeline rather than seven disconnected answers.

Constraints & Assumptions

  • Library: scikit-learn (you may reference pandas / numpy ), with the model packaged for a Python serving environment.
  • Label horizon: churn within a fixed 30-day prediction window following a snapshot (the "as-of" date).
  • Imbalance: churners are roughly 5–20% of rows; treat the exact rate as data-dependent, not a fixed number.
  • Features: mixed numeric + categorical; some categoricals are high-cardinality , and unseen categories can appear at inference time.
  • Acting on predictions: downstream, a churn score triggers a retention action, so calibrated probabilities and a business-driven decision threshold matter, not just a ranking.
  • Reproducibility: fix random seeds; the same preprocessing must run identically in training and production.

Clarifying Questions to Ask

  • How is churn operationally defined — hard cancellation only, or also "inactive" (no logins/usage), and over what exact window relative to the as-of date?
  • Is the data time-dependent (do we have multiple monthly snapshots per customer), or is it a single cross-sectional snapshot? This decides whether splits and CV must be chronological .
  • What is the downstream action and its economics — what does a retention offer cost, and what is the value of a saved customer? This sets the operating threshold and the metric we optimize.
  • What is the prediction cadence and latency budget (nightly batch scoring vs. real-time), and what infrastructure will load the model?
  • Are there fairness, regulatory, or feature-availability constraints (e.g., a feature that exists in training but is not reliably available at serving time)?
  • How will we measure success after deployment — is there a holdout / A-B test for the retention campaign, not just offline AUC?

Part 1 — Problem framing, label construction, and leakage prevention

Define the target precisely and lay out how you build the labeled dataset so that no information from the future (after the as-of date) leaks into the features. Describe what split strategy you use and why.

What This Part Should Cover

  • A precise, time-anchored definition of y and the feature cutoff.
  • Identification of concrete leakage risks (target, identity, window overlap) and how each is closed.
  • A justified choice between stratified vs. out-of-time splitting tied to whether the data is time-dependent.
  • Holding out a final test set that is never touched during model selection.

Part 2 — Preprocessing numeric and categorical features

Build the feature-preparation layer. Explain how you handle missing values, scaling, and encoding for the two column types, and how this is wired so it cannot leak.

What This Part Should Cover

  • Numeric path: imputation strategy and scaling (and when scaling matters — linear/distance models vs. trees).
  • Categorical path: imputation + an encoding that tolerates unseen categories at serve time.
  • Containing all transforms inside the pipeline to prevent train/validation leakage.
  • A plan for high-cardinality categoricals (rare-level grouping, min_frequency , hashing, or target/ordinal encoding with care).

Part 3 — Baselines and model choice

Choose what to model with. Start from a defensible baseline before reaching for something heavier, and say how each option deals with class imbalance.

What This Part Should Cover

  • A true baseline (majority/prior) to contextualize any "good" AUC.
  • An interpretable linear model and a non-linear ensemble, with the trade-offs (interpretability/calibration vs. raw performance).
  • Per-model handling of imbalance ( class_weight , sample_weight , resampling, or thresholding) and awareness of which estimators support which knob.

Part 4 — Hyperparameter tuning with cross-validation

Tune the chosen pipeline with cross-validation. Specify the CV scheme, the search, and the scoring metric, and explain why the search must wrap the whole pipeline.

What This Part Should Cover

  • CV scheme consistent with Part 1's split decision (stratified vs. temporal).
  • A search over both preprocessing and model parameters using __ -addressed names.
  • A scoring metric appropriate to imbalance (PR-AUC / ROC-AUC, not accuracy), and how refit selects the final estimator.
  • Avoiding overfitting the search itself (reasonable grid size, nested CV awareness).

Part 5 — Evaluation and threshold selection

Evaluate the tuned model on the untouched test set and turn probabilities into decisions. Explain which metrics you report and how you pick the operating threshold.

What This Part Should Cover

  • Final evaluation on the held-out test set (not CV scores) with imbalance-aware metrics.
  • A principled, cost-aware threshold choice rather than a hard-coded 0.5.
  • Connecting the threshold back to the retention economics from the clarifying questions.

Part 6 — Probability calibration

The downstream retention logic uses the probability, so explain when and how you calibrate, and how you verify calibration.

What This Part Should Cover

  • Recognizing which models need calibration and why it matters when probabilities drive decisions.
  • Correct calibration mechanics (held-out / cross-fit data, sigmoid vs. isotonic trade-off).
  • A way to measure calibration quality (reliability diagram, Brier score), not just assert it.

Part 7 — Packaging the final pipeline for production

Ship the artifact. Describe what you persist, how scoring works at serve time, and what schema/robustness concerns you handle.

What This Part Should Cover

  • Serializing the complete fitted pipeline + threshold + schema/metadata as one bundle.
  • A clean scoring path that aligns input columns to the training schema and emits both probability and decision.
  • Operational concerns: version pinning, input validation, and reproducibility.

What a Strong Answer Covers

Across all parts, a strong answer treats this as one leakage-safe, reproducible pipeline rather than seven isolated snippets, and keeps coming back to the cross-cutting threads:

  • Leakage discipline end to end — the as-of/label boundary in Part 1 is honored by fitting all preprocessing inside the CV folds in Parts 2 and 4.
  • Imbalance and business cost as a through-line — the same retention economics inform metric choice (Part 4), threshold (Part 5), and the case for calibration (Part 6).
  • Reproducibility & production-readiness — seeds, version pinning, schema validation, and a single serializable artifact.
  • Post-deployment thinking — monitoring for data/concept drift, score-distribution and calibration stability, and measuring real retention lift, not just offline AUC.

Follow-up Questions

  • Suppose churn rate drifts over time (seasonality, a pricing change). How would you detect this in production and decide when to retrain, and would you change your validation scheme?
  • A stakeholder asks why a specific customer was flagged. How would you produce per-prediction explanations (e.g., coefficients for the linear model, or SHAP for the tree model) without breaking the pipeline abstraction?
  • The retention team can only contact the top 2% of customers per day (a capacity constraint). How does that change which metric you optimize and how you set the threshold (think ranking / precision@k rather than a fixed cutoff)?
  • If you replaced GridSearchCV with a more expensive search and a larger model, how would you guard against overfitting the model-selection process itself (e.g., nested cross-validation)?
Loading comments...

Browse More Questions

More Machine Learning•More Apple•More Data Scientist•Apple Data Scientist•Apple Machine Learning•Data Scientist Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.