System Design: Image Object Detection Platform
Context
Design a multi-tenant system that processes user-uploaded images and returns detected objects with bounding boxes and confidence scores. The platform must support both online (synchronous) inference for interactive use and batch (asynchronous) processing for large jobs. Provide a capacity plan, end-to-end architecture, deployment and monitoring strategy, and address privacy/security.
If any requirement is unspecified, state and justify your assumptions before proceeding.
Clarify and Lock Requirements
Provide concrete values or acceptable ranges for:
-
Traffic and workloads
-
Online: target QPS (sustained and burst), latency SLOs (p50, p95, p99), availability SLO.
-
Batch: daily volume, completion windows (e.g., nightly 8 hours), acceptable queueing delay, and throughput targets (images/sec).
-
Payloads
-
Image formats, max dimensions, avg and p95 sizes, hard max size.
-
Accuracy and quality
-
Metrics (e.g., mAP@IoU), calibration targets (ECE), and per-class thresholds if needed.
-
Tenancy and quotas
-
Per-tenant rate limits, storage retention, and result TTLs.
-
Regions and data residency
-
Single region or multi-region; any residency constraints.
Deliverables
-
API and data model
-
Propose REST endpoints for uploads and detection (sync and async). Include request/response shapes and idempotency.
-
Define result schema: classes, bounding boxes, confidence, model version, processing time.
-
Capacity planning and SLO budget
-
Estimate network ingress, storage, compute (GPU/CPU), and concurrency. Show key calculations and headroom.
-
End-to-end architecture
-
Ingestion: API gateway, authN/authZ, WAF, rate limiting, quotas, idempotency.
-
Storage: object storage for images, metadata database for jobs/results, CDN strategy if applicable.
-
Preprocessing: decode, resize/normalize, EXIF orientation, content hash for dedupe.
-
Model serving: framework choice, GPU autoscaling, batching, model warm pools, and caching.
-
Asynchronous workflows: queue/stream, workers, retries, dead-letter handling, webhooks/polling.
-
Caching: content-hash and model-version aware caching policies.
-
ML lifecycle and experimentation
-
Model versioning and registry, A/B testing or shadow traffic, offline evaluation metrics, drift detection.
-
Monitoring and reliability
-
Metrics and alerts: latency, error rates, GPU utilization/memory, queue lag, data/label drift, cost per 1k images.
-
Failure handling and graceful degradation strategies.
-
Training pipeline
-
Data labeling, augmentation, dataset versioning, experiments, validations, and promotion criteria.
-
Deployment strategy
-
Blue/green or canary; rollback plan; compatibility and schema versioning.
-
Privacy, security, and compliance
-
Encryption, access controls, audit logging, data minimization/retention, right-to-be-forgotten, and applicable frameworks (e.g., GDPR/CCPA, SOC 2, HIPAA if relevant).
-
Cost vs. performance trade-offs
-
GPU choices, quantization/distillation, batching impacts, spot vs on-demand, and caching ROI.
Provide an annotated architecture diagram description (text is fine) and justify key trade-offs.