Design a pipeline orchestration system on Kubernetes
Company: Salesforce
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Onsite
Design a **pipeline/workflow orchestration system** (similar to a DAG-based scheduler) that runs workloads on **Kubernetes**.
### Functional requirements
- Users can define pipelines as **DAGs** of tasks (task dependencies).
- Tasks run as containerized jobs on Kubernetes.
- Support scheduling (cron / interval), ad-hoc runs, and manual retries.
- Track task/run states (queued, running, success, failed, cancelled).
- Provide logs per task and basic UI/API to view pipeline status.
- Retry policy, timeouts, and failure notifications.
### Non-functional requirements
- Multi-tenant support (teams/namespaces) and RBAC.
- Scale to many concurrent runs (e.g., thousands of tasks/minute).
- High availability of control plane.
- Handle worker/pod failures and scheduler restarts.
- Observability: metrics, tracing, structured logs.
### Constraints / assumptions
- You can assume Kubernetes is available.
- You can choose the storage, queue, and execution model.
Describe the high-level architecture, core components, data model, and how you handle scheduling, execution, retries, and failure recovery.
Quick Answer: This question evaluates a candidate's ability to design scalable, highly available pipeline and workflow orchestration systems on Kubernetes, emphasizing DAG-based scheduling, containerized task execution, retry and failure handling, multi-tenant RBAC, and observability.