PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/ML System Design/Apple

Design an ML keyword recommendation pipeline

Last updated: Mar 29, 2026

Quick Overview

Design an ML keyword recommendation pipeline evaluates ML product requirements, data/labeling, modeling, serving architecture, evaluation, monitoring, and trade-offs in a realistic interview setting. A strong answer states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.

  • hard
  • Apple
  • ML System Design
  • Machine Learning Engineer

Design an ML keyword recommendation pipeline

Company: Apple

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Technical Screen

Design an ML pipeline that generates search keyword recommendations for an app marketplace. Given a query like "games," produce diverse, typed suggestions (e.g., genres such as puzzle, RPG, racing) with high relevance and coverage. Specify objectives and constraints (relevance, diversity, freshness, latency, privacy). Detail data sources (query/search logs, clicks, installs, uninstalls, app metadata and taxonomy, reviews, co-search/co-click graphs, embeddings, locale signals) and labeling/feedback strategies. Propose the system architecture: candidate generation and ranking stages, feature store, offline training, online serving, cache, and retrieval. Describe features (text/semantic embeddings, popularity/recency, user/context signals, co-occurrence, graph features, quality/spam signals). Compare model options (BM25/ANN retrieval, two-tower retrieval, gradient-boosted trees, pairwise/listwise rankers, sequence models, graph models) and justify choices. Define evaluation metrics and experimentation (CTR, install rate, coverage, diversity, precision/recall, latency/errors; A/B testing and guardrails). Explain online/continual training after launch (streaming feedback ingestion, feature freshness, update cadence, warm-starting, drift detection, rollback). Discuss handling cold start, multilingual/locale variants, spam/abuse, and fairness.

Quick Answer: Design an ML keyword recommendation pipeline evaluates ML product requirements, data/labeling, modeling, serving architecture, evaluation, monitoring, and trade-offs in a realistic interview setting. A strong answer states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.

Related Interview Questions

  • Design a CPA system for ad bidding - Apple (medium)
  • Optimize image filters on device - Apple (medium)
  • Design a news feed ranking system - Apple (medium)
  • Design a grounded voice assistant - Apple (medium)
  • Design a streaming embedding-based classifier - Apple (hard)
|Home/ML System Design/Apple

Design an ML keyword recommendation pipeline

Apple logo
Apple
Jul 26, 2025, 12:00 AM
hardMachine Learning EngineerTechnical ScreenML System Design
6
0

Design an ML keyword recommendation pipeline

ML System Design: Typed Search Keyword Recommendations for an App Marketplace

Goal

Design an end-to-end ML pipeline that, given a user query (e.g., "games"), generates diverse, typed keyword suggestions (e.g., "puzzle games", "RPG games", "racing games") with high relevance and coverage.

Assume you are designing for a large-scale app marketplace with millions of users and tens of thousands of queries per second during peak. Typed suggestions are grounded in a controlled taxonomy (e.g., Genre, Feature, Price, Age, Mode) and must be compliant with marketplace policies.

Requirements

  1. Objectives and constraints
  • Relevance, diversity, coverage
  • Freshness/trends, multilingual/locale correctness
  • Latency and availability SLOs
  • Privacy and policy compliance
  1. Data sources and labeling/feedback
  • Query/search logs, clicks, installs, uninstalls
  • App metadata and taxonomy
  • Reviews text, co-search/co-click graphs
  • Embeddings, locale signals
  • Labeling: implicit feedback (CTR/installs), counterfactual debiasing, editorial seeds
  1. System architecture
  • Candidate generation: lexical, semantic (ANN), taxonomy, graph/co-occurrence, trending
  • Ranking: multi-stage (LTR + neural), diversity-aware re-rank
  • Feature store (offline/online), offline training, online serving, cache, retrieval indices
  1. Features
  • Text/semantic embeddings, lexical features
  • Popularity/recency/trending signals
  • User/context signals (locale, device)
  • Co-occurrence/graph features (PMI, P(s|q))
  • Quality/spam trust signals
  1. Models and choices
  • Retrieval: BM25, two-tower ANN, graph-based expansion
  • Ranking: GBDT, pairwise/listwise LTR, cross-encoder re-ranker, optional sequence/graph models
  1. Evaluation and experimentation
  • Metrics: CTR, install rate, NDCG, recall@K, coverage/diversity, latency/errors, calibration
  • A/B testing with guardrails and statistical rigor
  1. Continual training/ops
  • Streaming feedback ingestion, feature freshness, update cadence
  • Warm-starting, drift detection, rollback
  1. Special cases
  • Cold start, multilingual/locale variants, spam/abuse, fairness and policy

Constraints & Assumptions

  • Preserve the scope, facts, inputs, and requested outputs from the prompt above.
  • If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
  • Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.

Clarifying Questions to Ask

  • Clarify users, core use cases, read/write patterns, scale, latency, availability, and data retention.
  • State explicit assumptions before making sizing or architecture decisions.
  • Prioritize the functional path first, then address reliability, security, observability, and rollout.

What a Strong Answer Covers

  • A scoped requirements summary with concrete non-goals and success metrics.
  • ML-specific data, model, evaluation, serving, and monitoring choices.
  • Reasoned trade-offs among simple and scalable designs, including bottlenecks and failure modes.
  • A validation, monitoring, migration, and launch plan appropriate for the risk level.

Follow-up Questions

  • What breaks first at 10x traffic or data volume?
  • How would you degrade gracefully during dependency failures?
  • What metrics and alerts would prove the design is healthy after launch?

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Apple•More Machine Learning Engineer•Apple Machine Learning Engineer•Apple ML System Design•Machine Learning Engineer ML System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.