System Design: End-to-End Image Object Detection Service
You are designing a production image object-detection service that ingests user images, runs detection models, and serves results via APIs. Assume this powers both synchronous user-facing requests and asynchronous batch/large uploads.
Provide a concrete, engineering-focused design that covers:
1) Requirements
-
Functional:
-
Ingest images via REST APIs (sync and async).
-
Run object detection and return bounding boxes, class labels, and confidences.
-
Store results and expose retrieval APIs.
-
Non-functional (state your targets and rationale):
-
Accuracy (e.g., mAP@0.5 / mAP@[0.5:0.95]).
-
Latency (p50/p95 for sync API), throughput (QPS), and availability (SLA/SLO).
-
Multi-tenant limits, quotas, and idempotency.
2) High-Level Architecture
Describe the components and data flow for:
-
API gateway, auth, rate limiting.
-
Image ingestion and storage.
-
Preprocessing (resize/normalize/format conversions).
-
Model serving (GPU inference), batching, and asynchronous workers/queues.
-
Result storage and retrieval.
-
Observability: metrics, logs, traces.
3) Data and Version Management
-
Dataset versioning, schema, and lineage.
-
Model registry and artifact/version rollout.
-
Backward/forward compatibility of APIs and stored results.
4) Performance Strategies
-
Batching strategies (dynamic vs fixed, max delay caps).
-
GPU utilization (concurrency, memory pinning, quantization, mixed precision).
-
Autoscaling policies (HPA based on queue depth/GPU metrics, cluster autoscaler).
-
Caching of results (image hash + model version), CDN considerations.
5) Modeling Choices
-
Single-stage (e.g., YOLO/RetinaNet) vs two-stage (e.g., Faster R-CNN): trade-offs and when to choose each.
-
Model export/serving format (ONNX/TensorRT/TorchScript) and optimizations.
6) Training and Labeling Pipeline
-
Labeling workflow (tooling, QC, consensus, active learning).
-
Data preprocessing/augmentation.
-
Training orchestration, hyperparameter tuning, and reproducibility.
7) Evaluation and Monitoring
-
Offline metrics: mAP, precision/recall, per-class breakdown, calibration.
-
Online metrics: latency, cost/request, drift, user feedback signals.
-
Alerting thresholds and dashboards.
8) Experimentation
-
A/B and canary strategies (traffic splits, sticky assignment, shadow mode).
-
Success metrics, guardrails, and rollback triggers.
9) Reliability and Backpressure
-
Failure modes (GPU node loss, queue spikes, bad inputs) and mitigations.
-
Timeouts, retries with backoff, circuit breakers.
-
Degradation strategies (lower-res inference, smaller model, async fallback).
10) Privacy, Compliance, and Security
-
Data retention, encryption (in transit/at rest), access controls, and audit.
-
Data residency and deletion (e.g., DSRs), PII handling.
11) Cost Controls
-
Right-sizing, autoscaling, spot capacity, and utilization targets.
-
Model compression (quantization/pruning), caching, and image downscaling.
12) Deployment
-
Blue/green and canary deployments for both models and serving infra.
-
Rollback mechanics and configuration/version pinning.
Be explicit about assumptions where needed and include small numeric examples (e.g., latency/throughput math) to justify design choices.