PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Amazon

Design a Churn Model: Handle Missing Data and Justify

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in handling messy, temporal subscription data including missing-value strategies, class imbalance, temporal validation, feature engineering, model selection, and understanding ensemble methods and model generalization (Random Forest internals and overfitting/underfitting).

  • medium
  • Amazon
  • Machine Learning
  • Data Scientist

Design a Churn Model: Handle Missing Data and Justify

Company: Amazon

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Technical Screen

##### Scenario Designing a churn-prediction model for a subscription product with messy real-world data. ##### Question How would you handle missing values in the training data and justify your approach? Given this churn-prediction problem, which ML algorithm would you choose and why? Explain how Random Forest works, including voting, feature bagging, and depth control. Define overfitting vs. underfitting and describe techniques to detect and mitigate each. ##### Hints Discuss imputation, ensemble strengths, cross-validation, regularisation, bias-variance trade-off.

Quick Answer: This question evaluates a data scientist's competency in handling messy, temporal subscription data including missing-value strategies, class imbalance, temporal validation, feature engineering, model selection, and understanding ensemble methods and model generalization (Random Forest internals and overfitting/underfitting).

Related Interview Questions

  • Predicting the Next Elevator Call Location - Amazon (medium)
  • Explain Transformer and MoE Fundamentals - Amazon (medium)
  • Explain Core ML Interview Concepts - Amazon (hard)
  • Evaluate NLP Classification Models - Amazon (easy)
  • Explain overfitting, regularization, and LLM techniques - Amazon (medium)
Amazon logo
Amazon
Aug 4, 2025, 10:55 AM
Data Scientist
Technical Screen
Machine Learning
1
0

Churn Prediction on Messy Subscription Data

Context

You are building a binary churn-prediction model for a subscription product. Historical customer-level data contains usage/activity, billing/payments, support interactions, demographics, and plan details. The data is messy: many fields have missing values, there is class imbalance (churn is rarer than non-churn), and features are time-dependent. We aim to predict whether a customer will churn in the next period (e.g., next 30 days) using only information available up to a cutoff date.

Assumptions:

  • Binary target: churn = 1 if a customer cancels or fails to renew in the next period; 0 otherwise.
  • Temporal validation is required (train on earlier periods, validate on later periods).
  • Some missingness is likely not at random (e.g., missing usage could reflect inactivity).

Tasks

  1. How would you handle missing values in the training data and justify your approach?
  2. Given this churn-prediction problem, which ML algorithm would you choose and why?
  3. Explain how Random Forest works, including voting, feature bagging, and depth control.
  4. Define overfitting vs. underfitting and describe techniques to detect and mitigate each.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.