Take‑Home: Design a Jenkins Pipeline for GPU Graphics Test Matrix
Context
You need to design a Jenkins-based CI/CD system that builds a graphics application/library, deploys it to GPU-equipped test hosts, and runs automated graphics tests across a matrix of GPU hardware and driver versions. The system must scale to dozens of machines and be reliable in the face of flaky tests and occasional host/driver problems.
Assume:
-
You have a fleet of bare‑metal agents with different GPU models and multiple driver versions.
-
Tests are GPU‑dependent, may use headless EGL or virtual displays, and produce JUnit XML, logs, and performance metrics.
-
Builds can run on CPU-only builders; tests require GPUs.
Task
Propose a Jenkins pipeline design and describe how you will handle:
-
Job orchestration and pipeline-as-code (Jenkinsfile, shared libraries).
-
Parallelization across a hardware/driver matrix without oversubscribing GPUs.
-
Artifact management (builds, logs, test results, metrics) and provenance.
-
Test sharding (balanced by historical duration) and re-run of failed tests.
-
Retry policies for transient infra vs. test-level flakiness.
-
Quarantining flaky tests so they don’t gate merges, while still being exercised.
-
Notifications and PR status reporting with a clear matrix summary.
-
Promotion gates from CI → pre-prod → prod (quality and performance thresholds, manual approval when needed).
-
Safe driver updates and rollback across the fleet (canaries, labels, blue/green, health checks).
Deliver:
-
A written design with rationale and guardrails.
-
A Jenkinsfile sketch that demonstrates the matrix fan-out, sharding, retries, artifact handling, and gating.
-
Any minimal assumptions you make explicit.