How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a medium difficulty ML System Design question, commonly asked during Technical Screen rounds at Intuit.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Intuit during technical interviews.

Design a "Future Best-Sellers" Prediction and Recommendation System

Q: Design a "Future Best-Sellers" Prediction and Recommendation System

This question evaluates a candidate's ability to design a large-scale ML/forecasting system that combines predictive modeling with low-latency serving. It tests skills in system architecture, data pipeline design, and reasoning about latency and evaluation trade-offs under real-time constraints, commonly assessed in ML system design interviews at the practical application level.

An e-commerce platform wants a new feature on its category pages. A user selects a product category (for example, running shoes or coffee makers) and types in a future time window — anything from "the next 24 hours" to a custom range up to roughly 90 days out. The system returns a ranked list of the items predicted to be the best sellers in that category during that window, so shoppers can buy what is about to be popular and the business can surface high-intent inventory.

Design this prediction-and-recommendation system end to end: the modeling approach, the data and serving architecture, how you meet an interactive latency budget, and how you evaluate it. The interviewer is explicitly open to either a classical forecasting/ML approach or an LLM-/agent-based approach — part of the exercise is to choose deliberately and justify the trade-off.

Constraints & Assumptions

Catalog scale: tens of millions of active items spread across thousands of categories; the long tail of items has sparse sales.
Traffic: the feature is surfaced on high-traffic category landing pages, so expect heavy, bursty read load.
Window: user-specified, from next 24h up to next 90 days ; the same (category, window) pair is requested by many users.
Output: a ranked top- K (e.g., top 20) items per request, with enough signal to render cards (predicted rank, optional confidence).
Latency: interactive feature — target p95 end-to-end under ~500 ms for the served response.
Data available: historical order/transaction logs, clickstream (views, add-to-cart), per-item attributes, inventory, price/promotion calendars, and seasonality signals.
Freshness: "best seller" should reflect recent momentum, so the underlying signals must update at least daily, ideally hourly.

Clarifying Questions to Ask

How is "best seller" defined for ranking — units sold , revenue/GMV , or a blended demand score? Does the business want absolute volume or rising momentum?
Is the ranking global (same list for everyone viewing the category) or personalized per user? This dramatically changes the precompute strategy.
How is the category defined — a fixed taxonomy node, or can users pick arbitrary filters (brand + price band + category)? Arbitrary filters explode the key space.
What is the freshness SLA — can yesterday's precomputed forecast be served, or must it react to an item going viral in the last hour?
How do we handle cold-start items (new SKUs with little history) and brand-new categories?
What is the cost ceiling? An on-demand LLM/agent call per request has very different economics than a nightly batch forecast served from cache.

Part 1 — Problem framing and modeling approach

Decide what the model actually predicts and how. Translate "best sellers in category C over window W" into a concrete learning/forecasting target, choose the modeling approach (classical demand forecasting + ranking vs. an LLM/agent pipeline), and specify the labels, features, and how arbitrary user windows are handled. Make an explicit recommendation between the traditional and LLM approaches and defend it against the latency and scale constraints above.

What This Part Should Cover Premium

Part 2 — Data flow and system architecture

Lay out the architecture for both the write/ingestion path (turning raw orders and clickstream into model-ready features and forecasts) and the read/serve path (turning a (category, window) request into a ranked list). Cover the data model, where forecasts are computed and stored, and the components in between.

Clarifying Questions for this Part

Is there an existing feature store / candidate retrieval service (category → item set) we can reuse, or do we build the category-to-items index ourselves?
What is the acceptable staleness of the materialized forecast (hourly vs daily refresh) — this sets the streaming-vs-batch boundary.

What This Part Should Cover Premium

Part 3 — Latency, caching, and cost

The feature is interactive (p95 < ~500 ms), yet a naive design — especially an LLM/agent pipeline that reasons per request — can take 20+ seconds and cost a lot per call. Show how you hit the budget. Address what is precomputed vs computed on demand, the caching strategy and its keys, cache invalidation/freshness, and graceful degradation under load.

What This Part Should Cover Premium

Part 4 — Evaluation and monitoring

Define how you measure whether the system is good, both before launch and in production, and what you monitor to catch regressions.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

The forecast refresh runs nightly, but an item goes viral at 2 p.m. and starts selling out. How does your design surface it before the next batch — and what would you change to make "rising momentum" first-class?
A user types a 90-day window. Forecast error compounds far into the future. How do you communicate or bound uncertainty in the ranking, and would you cap the window?
The product team wants the list personalized per user instead of global. Quantify the impact on your caching/precompute strategy and propose how to keep latency under budget.
Suppose you must incorporate an LLM/agent somewhere because reviews and external trend signals genuinely improve cold-start accuracy. Exactly where in the pipeline does it go, and how do you stop it from leaking onto the latency-critical path?

Design a "Future Best-Sellers" Prediction and Recommendation System

Constraints & Assumptions

Catalog scale: tens of millions of active items spread across thousands of categories; the long tail of items has sparse sales.
Traffic: the feature is surfaced on high-traffic category landing pages, so expect heavy, bursty read load.
Window: user-specified, from next 24h up to next 90 days ; the same (category, window) pair is requested by many users.
Output: a ranked top- K (e.g., top 20) items per request, with enough signal to render cards (predicted rank, optional confidence).
Latency: interactive feature — target p95 end-to-end under ~500 ms for the served response.
Data available: historical order/transaction logs, clickstream (views, add-to-cart), per-item attributes, inventory, price/promotion calendars, and seasonality signals.
Freshness: "best seller" should reflect recent momentum, so the underlying signals must update at least daily, ideally hourly.

Clarifying Questions to Ask

How is "best seller" defined for ranking — units sold , revenue/GMV , or a blended demand score? Does the business want absolute volume or rising momentum?
Is the ranking global (same list for everyone viewing the category) or personalized per user? This dramatically changes the precompute strategy.
How is the category defined — a fixed taxonomy node, or can users pick arbitrary filters (brand + price band + category)? Arbitrary filters explode the key space.
What is the freshness SLA — can yesterday's precomputed forecast be served, or must it react to an item going viral in the last hour?
How do we handle cold-start items (new SKUs with little history) and brand-new categories?
What is the cost ceiling? An on-demand LLM/agent call per request has very different economics than a nightly batch forecast served from cache.

Part 1 — Problem framing and modeling approach

What This Part Should Cover Premium

Part 2 — Data flow and system architecture

Clarifying Questions for this Part

Is there an existing feature store / candidate retrieval service (category → item set) we can reuse, or do we build the category-to-items index ourselves?
What is the acceptable staleness of the materialized forecast (hourly vs daily refresh) — this sets the streaming-vs-batch boundary.

What This Part Should Cover Premium

Part 3 — Latency, caching, and cost

What This Part Should Cover Premium

Part 4 — Evaluation and monitoring

Define how you measure whether the system is good, both before launch and in production, and what you monitor to catch regressions.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

The forecast refresh runs nightly, but an item goes viral at 2 p.m. and starts selling out. How does your design surface it before the next batch — and what would you change to make "rising momentum" first-class?
A user types a 90-day window. Forecast error compounds far into the future. How do you communicate or bound uncertainty in the ranking, and would you cap the window?
The product team wants the list personalized per user instead of global. Quantify the impact on your caching/precompute strategy and propose how to keep latency under budget.
Suppose you must incorporate an LLM/agent somewhere because reviews and external trend signals genuinely improve cold-start accuracy. Exactly where in the pipeline does it go, and how do you stop it from leaking onto the latency-critical path?

Design a "Future Best-Sellers" Prediction and Recommendation System

Quick Overview

Design a "Future Best-Sellers" Prediction and Recommendation System

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Problem framing and modeling approach

What This Part Should Cover Premium

Part 2 — Data flow and system architecture

Clarifying Questions for this Part

What This Part Should Cover Premium

Part 3 — Latency, caching, and cost

What This Part Should Cover Premium

Part 4 — Evaluation and monitoring

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Submit Your Answer to Earn 20XP

Design a "Future Best-Sellers" Prediction and Recommendation System

Quick Overview

Design a "Future Best-Sellers" Prediction and Recommendation System

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Problem framing and modeling approach

What This Part Should Cover Premium

Part 2 — Data flow and system architecture

Clarifying Questions for this Part

What This Part Should Cover Premium

Part 3 — Latency, caching, and cost

What This Part Should Cover Premium

Part 4 — Evaluation and monitoring

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Submit Your Answer to Earn 20XP