PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Apple

Implement random forest with OOB and imbalance

Last updated: Mar 29, 2026

Quick Overview

This question evaluates expertise in implementing and engineering a memory- and compute-efficient Random Forest for binary classification, covering ensemble methods, CART/Gini impurity, OOB evaluation and calibration, class-imbalance handling, reproducibility, parallel and thread-safe system design, and streaming/warm-start model updates.

  • hard
  • Apple
  • Machine Learning
  • Data Scientist

Implement random forest with OOB and imbalance

Company: Apple

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

Implement a binary‑classification Random Forest from scratch for N=200,000, d=100 (mixed numerical and high‑cardinality categorical), with a 2 GB memory budget. Requirements: (a) Trees: CART with Gini impurity, max_depth, min_samples_leaf; handle missing values via surrogate splits or median/mode imputation; support categorical splits without one‑hot. (b) Bagging and feature bagging: bootstrap per tree; m_try=⌊√d⌋ at each split; ensure deterministic reproducibility with per‑tree seeds. (c) Out‑of‑bag (OOB) evaluation: compute OOB ROC‑AUC, PR‑AUC, and a reliability diagram; derive class‑probability calibration using OOB Platt scaling vs isotonic—justify when each is preferable. (d) Severe class imbalance: positive rate ≈1%; incorporate a 10× cost for FN vs FP into split selection and sampling (class_weight or stratified bootstrap). (e) Parallelization: design thread‑safe data structures and quantify training/inference complexity and memory; estimate wall‑clock time on 8 cores. (f) Streaming/warm‑start: add trees over time without retraining existing ones; discuss concept‑drift detection with OOB metrics. Provide clear pseudocode for train(), predict_proba(), and OOB evaluation, and justify all design choices.

Quick Answer: This question evaluates expertise in implementing and engineering a memory- and compute-efficient Random Forest for binary classification, covering ensemble methods, CART/Gini impurity, OOB evaluation and calibration, class-imbalance handling, reproducibility, parallel and thread-safe system design, and streaming/warm-start model updates.

Related Interview Questions

  • Implement Masked Multi-Head Self-Attention - Apple (easy)
  • Compare DCN v1 vs v2 and A/B test - Apple (medium)
  • Explain dataset size, generalization, and U-Net skips - Apple (medium)
  • Analyze vision model failures - Apple (medium)
  • Compare audio preprocessing and training - Apple (medium)
|Home/Machine Learning/Apple

Implement random forest with OOB and imbalance

Apple logo
Apple
Oct 13, 2025, 9:49 PM
hardData ScientistOnsiteMachine Learning
4
0

Implement a Memory-Efficient Random Forest (Binary Classification) Under Constraints

You are asked to design and implement a Random Forest for binary classification under the following constraints. Assume a dataset with N = 200,000 rows and d = 100 features (a mix of numerical and high-cardinality categorical), and a total memory budget of 2 GB. Your design should be robust enough for a production environment and suitable for an onsite interview discussion.

Requirements

  1. Trees
  • Use CART with Gini impurity.
  • Hyperparameters: max_depth, min_samples_leaf.
  • Missing values: either surrogate splits or median/mode imputation.
  • Categorical splits: support directly (no one-hot encoding).
  1. Bagging and Feature Bagging
  • Bootstrap sampling per tree (bagging).
  • Feature subsampling per node with m_try = ⌊√d⌋.
  • Deterministic reproducibility with per-tree seeds.
  1. Out-of-Bag (OOB) Evaluation
  • Compute OOB ROC-AUC, PR-AUC, and a reliability diagram.
  • Calibrate class probabilities using OOB predictions: compare Platt scaling vs isotonic regression; justify when each is preferable.
  1. Severe Class Imbalance
  • Positive rate ≈ 1%.
  • Incorporate a 10× misclassification cost for FN vs FP into both split selection and sampling (e.g., class_weight or stratified bootstrap).
  1. Parallelization and Systems Aspects
  • Design thread-safe data structures.
  • Quantify training and inference time complexity and memory usage.
  • Estimate wall-clock time on 8 cores.
  1. Streaming / Warm-Start
  • Support adding trees over time without retraining existing trees.
  • Discuss concept-drift detection using OOB metrics.
  1. Deliverables
  • Clear pseudocode for train(), predict_proba(), and OOB evaluation.
  • Justify all design choices.

Assume m_try = ⌊√100⌋ = 10. If information is missing (e.g., exact numeric vs categorical split), make minimal, explicit assumptions to complete the design.

Loading comments...

Browse More Questions

More Machine Learning•More Apple•More Data Scientist•Apple Data Scientist•Apple Machine Learning•Data Scientist Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.