PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/DoorDash

Discuss ML infrastructure fundamentals

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of ML infrastructure fundamentals, including the end-to-end ML stack, scalable feature store design, reproducibility and versioning practices, and production monitoring and troubleshooting for low-latency, high-availability systems.

  • hard
  • DoorDash
  • ML System Design
  • Machine Learning Engineer

Discuss ML infrastructure fundamentals

Company: DoorDash

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Technical Screen

##### Question What are the key components of a modern machine-learning infrastructure stack and how do they interact? Describe how you would design a scalable feature store to support both offline training and real-time inference. Explain strategies to ensure reproducibility and versioning of data, code, and models in an ML pipeline. How would you monitor and troubleshoot production ML services for latency, drift, and model degradation?

Quick Answer: This question evaluates understanding of ML infrastructure fundamentals, including the end-to-end ML stack, scalable feature store design, reproducibility and versioning practices, and production monitoring and troubleshooting for low-latency, high-availability systems.

DoorDash logo
DoorDash
Jul 29, 2025, 8:05 AM
Machine Learning Engineer
Technical Screen
ML System Design
13
0

ML System Design: Infra Stack, Feature Store, Reproducibility, and Monitoring

Context: You are designing and operating a machine learning platform that powers real-time, high-traffic use cases (for example: delivery ETA, dispatch/matching, ranking, fraud prevention). The system must support batch training, real-time inference, and stringent latency/availability SLAs.

1) Modern ML Infrastructure Stack

Describe the key components of a modern ML infrastructure stack and how they interact end-to-end from data generation to model impact in production.

2) Scalable Feature Store

Design a feature store that supports both:

  • Offline training (historical, point-in-time correct feature computation and backfills).
  • Online inference (low-latency feature retrieval, high freshness, and consistency with offline definitions).

Explain the architecture, data model, consistency model, and pipelines required.

3) Reproducibility and Versioning

Explain strategies to ensure reproducibility and versioning of data, code, configurations, features, and models throughout the ML pipeline.

4) Monitoring and Troubleshooting in Production

Describe how you would monitor and troubleshoot production ML services for:

  • Latency and availability (P50/P95/P99, error rates),
  • Data/feature drift and concept drift,
  • Model degradation (online metrics and delayed labels).

Include alerting, debugging playbooks, and safe-guard strategies.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More DoorDash•More Machine Learning Engineer•DoorDash Machine Learning Engineer•DoorDash ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.