Design a Low-Latency Store Recommender

Q: Design a Low-Latency Store Recommender

This question evaluates system design and machine learning competencies for real-time, low-latency store recommendation systems, including retrieval, pre-ranking and ranking, geospatial caching, feature serving, model versioning, and experimentation.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Q: What difficulty level is this interview question?

This is a hard difficulty Machine Learning question, commonly asked during Onsite rounds at DoorDash.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at DoorDash during technical interviews.

Question

Loading...

You are designing the home-page store recommendation system for a food delivery app such as DoorDash.

A request contains very little context: primarily user_id and the user's current latitude/longitude. The system must return a ranked list of stores for the app home page.

Hard constraints

Recommended stores must be within the deliveryable area for the user.
Recommended stores must be open at request time .
The system is latency-sensitive and powers the home page.

Product goal

Design a recommendation system that maximizes long-term business value, such as orders or contribution profit, while balancing user engagement, relevance, freshness, and system latency. Discuss what primary metric you would optimize for and what guardrail metrics you would monitor.

What to cover

End-to-end architecture
- Describe the online request flow from request intake to final ranked list.
- Explain how you would structure retrieval, pre-ranking, ranking, and post-processing.
- Discuss how you would handle cold-start users, sparse geographies, and new stores.
Retrieval design
- Propose multiple candidate-generation strategies, given that the online inputs are limited.
- Explain how you would ensure all candidates satisfy delivery-range and open-now constraints.
- Discuss how you would merge, deduplicate, and budget candidates across retrieval channels.
Location-aware caching
- How would you use a geospatial indexing scheme such as Geohash or H3 to support caching?
- Would you precompute popular stores per grid cell offline?
- What cache key, TTL, and invalidation strategy would you use, especially when store availability and open status change frequently?
Extreme latency constraint
- Suppose each retrieval route has a very strict timeout budget, for example 15 ms .
- How would you optimize parallel fan-out, partial results, fallback behavior, and service-level reliability under such a tight budget?
Ranking and feature platform
- Design the feature-serving infrastructure for different feature types: embeddings , numeric features , and categorical features .
- Explain how you would store and serve features keyed by store_id , user_id , and possibly user-store pairs .
- Assume offline feature pipelines refresh hourly. How would you support high-concurrency online inference while keeping features reasonably fresh and point-in-time correct?
Model versioning and experimentation
- Model version V2.0 adds new features relative to V1.1.
- How would your infrastructure support multiple model versions without breaking online serving?
- How would you configure different treatment groups in an A/B test to fetch different feature sets or model artifacts?
- What experiment design choices would you make, including randomization unit, success metrics, guardrails, and failure detection?
Real-time versus batch features
- Discuss the trade-offs between adding real-time features and relying on offline batch features.
- Under strict latency requirements, what can go wrong if you overuse real-time features?
- How would you design graceful degradation for feature timeouts, missing values, or upstream instability?

Your answer should explicitly address modeling trade-offs, latency and reliability constraints, experimentation, and common production pitfalls such as training-serving skew, missing features, and marketplace-side side effects.

Design a Low-Latency Store Recommender

Quick Overview

Hard constraints

Product goal

What to cover

Solution

Comments (0)