How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a hard difficulty Machine Learning question, commonly asked during Onsite rounds at DoorDash.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at DoorDash during technical interviews.

Design a Homepage Store Recommender | DoorDash Interview Question

Design a Homepage Store Recommender

Company: DoorDash

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

You are designing the homepage store recommendation system for a food-delivery app similar to DoorDash. When a user opens the app, the online request contains very little context: primarily `user_id` and the user's current latitude/longitude. The system must return a ranked list of stores for the homepage feed under the following hard constraints: - Every recommended store must be within the user's delivery range. - Every recommended store must be currently open. - The system serves high traffic, so online latency and reliability are critical. - Assume each retrieval source has an aggressive timeout budget of about 15 ms. Design the end-to-end ML system, and address the following: 1. **Overall architecture** - How would you structure candidate retrieval, filtering, ranking, and serving? - What are the main online and offline components? 2. **Candidate retrieval** - How would you generate candidates given only `user_id` and location? - What retrieval channels would you include (for example: nearby popular stores, user affinity, cuisine/category similarity, embedding-based retrieval, cold-start fallbacks)? - How would you enforce the hard constraints on delivery eligibility and store open status? 3. **Geospatial caching** - How would you use a geospatial index such as Geohash, H3, or a grid system for caching or precomputing location-based candidate sets? - What would the cache key look like? - How would you handle cache invalidation when stores open/close or delivery eligibility changes? 4. **Extreme latency constraints** - If each retrieval path must finish within about 15 ms, how would you optimize fan-out and parallel fetching? - How would you degrade gracefully when one or more retrieval sources time out? 5. **Ranking and feature platform** - How would you build the ranking layer? - What objective would you optimize: click-through rate, order conversion, GMV, long-term retention, delivery quality, or some weighted combination? - How would you avoid feedback loops, popularity bias, and over-optimization for short-term clicks? 6. **Feature store design** - Different feature types exist: dense embeddings, numeric features, and categorical features. How would you store them differently at the database layer? - How would you key features using `user_id`, `store_id`, and possibly `user_id + store_id`? - How would you support hourly offline refreshes while preserving high-concurrency, low-latency online reads? 7. **Model iteration and experimentation** - Suppose model version V2.0 adds several new features relative to V1.1. How should the infrastructure support multiple model versions at once? - How would different A/B test treatments fetch different feature sets or feature configurations safely? - What offline and online metrics would you use to evaluate the change? 8. **Real-time versus batch features** - What are the tradeoffs between real-time features and offline batch-computed features in this system? - What failure modes appear when you add real-time features under strict latency requirements, such as timeouts, missing values, training-serving skew, and stability issues? - How would you decide which features must be real-time versus batch? Your answer should include system architecture, storage choices, ML tradeoffs, experimentation strategy, and operational safeguards.

Quick Answer: This question evaluates system-level machine learning and recommender competencies, including candidate retrieval, filtering and ranking, feature-store design, geospatial caching, low-latency serving, and experimentation infrastructure.

The system must return a ranked list of stores for the homepage feed under the following hard constraints:

Every recommended store must be within the user's delivery range.
Every recommended store must be currently open.
The system serves high traffic, so online latency and reliability are critical.
Assume each retrieval source has an aggressive timeout budget of about 15 ms.

Design the end-to-end ML system, and address the following:

Overall architecture
- How would you structure candidate retrieval, filtering, ranking, and serving?
- What are the main online and offline components?
Candidate retrieval
- How would you generate candidates given only user_id and location?
- What retrieval channels would you include (for example: nearby popular stores, user affinity, cuisine/category similarity, embedding-based retrieval, cold-start fallbacks)?
- How would you enforce the hard constraints on delivery eligibility and store open status?
Geospatial caching
- How would you use a geospatial index such as Geohash, H3, or a grid system for caching or precomputing location-based candidate sets?
- What would the cache key look like?
- How would you handle cache invalidation when stores open/close or delivery eligibility changes?
Extreme latency constraints
- If each retrieval path must finish within about 15 ms, how would you optimize fan-out and parallel fetching?
- How would you degrade gracefully when one or more retrieval sources time out?
Ranking and feature platform
- How would you build the ranking layer?
- What objective would you optimize: click-through rate, order conversion, GMV, long-term retention, delivery quality, or some weighted combination?
- How would you avoid feedback loops, popularity bias, and over-optimization for short-term clicks?
Feature store design
- Different feature types exist: dense embeddings, numeric features, and categorical features. How would you store them differently at the database layer?
- How would you key features using user_id , store_id , and possibly user_id + store_id ?
- How would you support hourly offline refreshes while preserving high-concurrency, low-latency online reads?
Model iteration and experimentation
- Suppose model version V2.0 adds several new features relative to V1.1. How should the infrastructure support multiple model versions at once?
- How would different A/B test treatments fetch different feature sets or feature configurations safely?
- What offline and online metrics would you use to evaluate the change?
Real-time versus batch features
- What are the tradeoffs between real-time features and offline batch-computed features in this system?
- What failure modes appear when you add real-time features under strict latency requirements, such as timeouts, missing values, training-serving skew, and stability issues?
- How would you decide which features must be real-time versus batch?

Your answer should include system architecture, storage choices, ML tradeoffs, experimentation strategy, and operational safeguards.

Design a Homepage Store Recommender

Company: DoorDash

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

The system must return a ranked list of stores for the homepage feed under the following hard constraints:

Every recommended store must be within the user's delivery range.
Every recommended store must be currently open.
The system serves high traffic, so online latency and reliability are critical.
Assume each retrieval source has an aggressive timeout budget of about 15 ms.

Design the end-to-end ML system, and address the following:

Overall architecture
- How would you structure candidate retrieval, filtering, ranking, and serving?
- What are the main online and offline components?
Candidate retrieval
- How would you generate candidates given only user_id and location?
- What retrieval channels would you include (for example: nearby popular stores, user affinity, cuisine/category similarity, embedding-based retrieval, cold-start fallbacks)?
- How would you enforce the hard constraints on delivery eligibility and store open status?
Geospatial caching
- How would you use a geospatial index such as Geohash, H3, or a grid system for caching or precomputing location-based candidate sets?
- What would the cache key look like?
- How would you handle cache invalidation when stores open/close or delivery eligibility changes?
Extreme latency constraints
- If each retrieval path must finish within about 15 ms, how would you optimize fan-out and parallel fetching?
- How would you degrade gracefully when one or more retrieval sources time out?
Ranking and feature platform
- How would you build the ranking layer?
- What objective would you optimize: click-through rate, order conversion, GMV, long-term retention, delivery quality, or some weighted combination?
- How would you avoid feedback loops, popularity bias, and over-optimization for short-term clicks?
Feature store design
- Different feature types exist: dense embeddings, numeric features, and categorical features. How would you store them differently at the database layer?
- How would you key features using user_id , store_id , and possibly user_id + store_id ?
- How would you support hourly offline refreshes while preserving high-concurrency, low-latency online reads?
Model iteration and experimentation
- Suppose model version V2.0 adds several new features relative to V1.1. How should the infrastructure support multiple model versions at once?
- How would different A/B test treatments fetch different feature sets or feature configurations safely?
- What offline and online metrics would you use to evaluate the change?
Real-time versus batch features
- What are the tradeoffs between real-time features and offline batch-computed features in this system?
- What failure modes appear when you add real-time features under strict latency requirements, such as timeouts, missing values, training-serving skew, and stability issues?
- How would you decide which features must be real-time versus batch?

Your answer should include system architecture, storage choices, ML tradeoffs, experimentation strategy, and operational safeguards.

Design a Homepage Store Recommender

Quick Overview

Solution

Comments (0)

Design a Homepage Store Recommender

Quick Overview

Solution

Comments (0)