PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/ML System Design/Reddit

Design a feature store with CI/CD and reliability

Last updated: Apr 28, 2026

Quick Overview

This question evaluates a candidate's competencies in designing scalable, low-latency feature stores and end-to-end ML infrastructure, covering feature ingestion, point-in-time correctness, serving APIs, CI/CD for feature pipelines, metadata/lineage, and reliability engineering.

  • hard
  • Reddit
  • ML System Design
  • Software Engineer

Design a feature store with CI/CD and reliability

Company: Reddit

Role: Software Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Technical Screen

Design a feature store to serve ML models for both offline training and low-latency online inference. Specify requirements; feature ingestion (batch and streaming), computation and storage layers, feature serving APIs, point-in-time correctness, offline–online consistency, freshness and backfills, lineage/metadata, governance and access control. Detail deployment architecture and environments, CI/CD for feature definitions and pipelines, testing strategies (unit/integration/e2e, data validation), caching layers and eviction policies, reliability/SLOs and fault tolerance, monitoring/alerting and rollback, and cost/rate-limiting considerations. Provide component diagrams and justify technology choices.

Quick Answer: This question evaluates a candidate's competencies in designing scalable, low-latency feature stores and end-to-end ML infrastructure, covering feature ingestion, point-in-time correctness, serving APIs, CI/CD for feature pipelines, metadata/lineage, and reliability engineering.

Related Interview Questions

  • Design comment ranking - Reddit (hard)
  • Design a video recommendation system - Reddit (medium)
  • Design comment-likelihood prediction platform - Reddit (medium)
Reddit logo
Reddit
Sep 6, 2025, 12:00 AM
Software Engineer
Technical Screen
ML System Design
6
0

System Design: Feature Store for Offline Training and Low‑Latency Online Inference

Context

You are designing a feature store to support machine learning models that require:

  • Offline training on historical data.
  • Low-latency online inference at high QPS.

Assume an internet-scale social platform with entities such as user, item/content, and community. Models include recommendations, feed ranking, and abuse/spam detection.

Requirements

Specify and design the following:

  1. Functional and non-functional requirements (SLOs, scale assumptions).
  2. Feature ingestion for batch and streaming data.
  3. Computation and storage layers (offline/online) with point-in-time correctness.
  4. Feature serving APIs for training data generation and online inference.
  5. Offline–online consistency strategy.
  6. Freshness SLAs, backfills, and bootstrap flows.
  7. Lineage/metadata, governance, and access control.
  8. Deployment architecture and environments.
  9. CI/CD for feature definitions and pipelines.
  10. Testing strategies (unit, integration, end-to-end, data validation).
  11. Caching layers and eviction policies.
  12. Reliability/SLOs and fault tolerance.
  13. Monitoring/alerting and rollback processes.
  14. Cost and rate-limiting considerations.
  15. Component diagrams and justified technology choices.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Reddit•More Software Engineer•Reddit Software Engineer•Reddit ML System Design•Software Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.