You are designing the homepage store recommendation system for a food-delivery app similar to DoorDash. When a user opens the app, the online request contains very little context: primarily user_id and the user's current latitude/longitude.
The system must return a ranked list of stores for the homepage feed under the following hard constraints:
-
Every recommended store must be within the user's delivery range.
-
Every recommended store must be currently open.
-
The system serves high traffic, so online latency and reliability are critical.
-
Assume each retrieval source has an aggressive timeout budget of about 15 ms.
Design the end-to-end ML system, and address the following:
-
Overall architecture
-
How would you structure candidate retrieval, filtering, ranking, and serving?
-
What are the main online and offline components?
-
Candidate retrieval
-
How would you generate candidates given only
user_id
and location?
-
What retrieval channels would you include (for example: nearby popular stores, user affinity, cuisine/category similarity, embedding-based retrieval, cold-start fallbacks)?
-
How would you enforce the hard constraints on delivery eligibility and store open status?
-
Geospatial caching
-
How would you use a geospatial index such as Geohash, H3, or a grid system for caching or precomputing location-based candidate sets?
-
What would the cache key look like?
-
How would you handle cache invalidation when stores open/close or delivery eligibility changes?
-
Extreme latency constraints
-
If each retrieval path must finish within about 15 ms, how would you optimize fan-out and parallel fetching?
-
How would you degrade gracefully when one or more retrieval sources time out?
-
Ranking and feature platform
-
How would you build the ranking layer?
-
What objective would you optimize: click-through rate, order conversion, GMV, long-term retention, delivery quality, or some weighted combination?
-
How would you avoid feedback loops, popularity bias, and over-optimization for short-term clicks?
-
Feature store design
-
Different feature types exist: dense embeddings, numeric features, and categorical features. How would you store them differently at the database layer?
-
How would you key features using
user_id
,
store_id
, and possibly
user_id + store_id
?
-
How would you support hourly offline refreshes while preserving high-concurrency, low-latency online reads?
-
Model iteration and experimentation
-
Suppose model version V2.0 adds several new features relative to V1.1. How should the infrastructure support multiple model versions at once?
-
How would different A/B test treatments fetch different feature sets or feature configurations safely?
-
What offline and online metrics would you use to evaluate the change?
-
Real-time versus batch features
-
What are the tradeoffs between real-time features and offline batch-computed features in this system?
-
What failure modes appear when you add real-time features under strict latency requirements, such as timeouts, missing values, training-serving skew, and stability issues?
-
How would you decide which features must be real-time versus batch?
Your answer should include system architecture, storage choices, ML tradeoffs, experimentation strategy, and operational safeguards.