Design feedback-driven recommender

Q: Design feedback-driven recommender

This question evaluates a candidate's competency in online machine learning and recommender system design, covering contextual bandits, feature engineering for users/items/context, real-time feedback processing, exploration–exploitation strategies, reward definition, offline and online evaluation, and scalable, robust architecture.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

Design: Contextual Bandit Recommendation with Online Learning

You are designing an online learning recommendation system. At each user interaction:

You receive exactly 4 candidate items from an upstream candidate generator.
You must choose exactly 1 item to show the user.
You receive immediate feedback (e.g., click or dwell time).
The model must update online so that future selections improve over time.

Provide a design that covers:

Model choice (with justification) for a contextual bandit setup.
Feature engineering for users, items, and context, including handling cold start.
Feedback handling and reward definition, including delayed/implicit signals and logging for learning.
Exploration–exploitation strategy and the selection algorithm.
Offline evaluation methodology and online experimentation/monitoring.

State any minimal assumptions you need (e.g., feedback semantics, latency constraints), and make your design robust to non-stationarity and scale.

Design feedback-driven recommender

Design: Contextual Bandit Recommendation with Online Learning

Solution

Comments (0)