Design feedback-driven recommender
Design: Contextual Bandit Recommendation with Online Learning
You are designing an online learning recommendation system. At each user interaction:
-
You receive exactly 4 candidate items from an upstream candidate generator.
-
You must choose exactly 1 item to show the user.
-
You receive immediate feedback (e.g., click or dwell time).
-
The model must update online so that future selections improve over time.
Provide a design that covers:
-
Model choice (with justification) for a contextual bandit setup.
-
Feature engineering for users, items, and context, including handling cold start.
-
Feedback handling and reward definition, including delayed/implicit signals and logging for learning.
-
Exploration–exploitation strategy and the selection algorithm.
-
Offline evaluation methodology and online experimentation/monitoring.
State any minimal assumptions you need (e.g., feedback semantics, latency constraints), and make your design robust to non-stationarity and scale.
Constraints & Assumptions
-
Preserve the scope, facts, inputs, and requested outputs from the prompt above.
-
If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
-
Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.
Clarifying Questions to Ask
-
Clarify users, core use cases, read/write patterns, scale, latency, availability, and data retention.
-
State explicit assumptions before making sizing or architecture decisions.
-
Prioritize the functional path first, then address reliability, security, observability, and rollout.
What a Strong Answer Covers
-
A scoped requirements summary with concrete non-goals and success metrics.
-
ML-specific data, model, evaluation, serving, and monitoring choices.
-
Reasoned trade-offs among simple and scalable designs, including bottlenecks and failure modes.
-
A validation, monitoring, migration, and launch plan appropriate for the risk level.
Follow-up Questions
-
What breaks first at 10x traffic or data volume?
-
How would you degrade gracefully during dependency failures?
-
What metrics and alerts would prove the design is healthy after launch?