Design an ML Platform Portal
Company: Netflix
Role: Machine Learning Engineer
Category: ML System Design
Difficulty: easy
Interview Round: Onsite
Design a web-based internal platform that gives machine learning engineers and data scientists a unified workflow for the full model lifecycle.
The portal should support:
- tracking experiments and their metadata,
- registering versioned models,
- deploying models to staging or production,
- promoting models across environments,
- monitoring production models.
Describe a complete design for this system, including:
1. the main user workflows and major backend components,
2. REST APIs for experiment tracking, model registry, deployment, promotion, and monitoring,
3. a database schema for experiments, models, versions, deployments, and audit history,
4. whether you would use REST, GraphQL, or both for the product,
5. how to monitor models in production, including data quality checks, missing-value rate, drift detection, PSI, and alerting,
6. what backend you would use for querying time-series monitoring data,
7. scalability, security, and reliability considerations.
Quick Answer: This question evaluates design and engineering skills for building an end-to-end machine learning platform, including experiment tracking, model registry and versioning, deployment and promotion workflows, monitoring, APIs, database schema, scalability, security, and time-series monitoring considerations.