System Design: Image Object Detection Service
Scenario
Design an image detection service that accepts user-uploaded images and returns detected objects with bounding boxes and confidence scores. The service must support both real-time online inference and high-throughput batch processing.
Clarify and Quantify Requirements
State assumptions if any numbers are missing, and justify them.
-
Traffic and performance
-
Online inference: target end-to-end latency (P50/P95/P99), expected and peak QPS, regional distribution, and acceptable tail behavior.
-
Batch processing: target throughput (images/sec), acceptable end-to-end SLA (e.g., minutes/hours), and concurrency.
-
Payloads: typical and max image size (KB/MB), formats (JPEG/PNG/WebP), and max resolution.
-
Quality
-
Accuracy metrics: required mAP@0.5 and mAP@[0.5:0.95], precision/recall, calibration targets.
-
Classes: expected number of object classes and class imbalance considerations.
-
Constraints
-
Cost budget, multi-tenancy/isolation needs, regions, and compliance requirements.
Deliverables
Propose and justify an end-to-end architecture that includes:
-
Ingestion
-
API gateway, auth (e.g., OAuth2/JWT), WAF, rate limiting/quotas, request validation.
-
Upload flow: direct-to-object-store with pre-signed URLs vs. through API.
-
Storage and data model
-
Object storage for raw/derived images; metadata database schema for requests, results, and model versions.
-
Preprocessing and postprocessing
-
Image validation, resizing/normalization, EXIF handling, virus scanning, and output formatting.
-
Model serving
-
GPU-backed serving, autoscaling policy, batching strategy, model formats (e.g., ONNX/TensorRT), and multi-model/version hosting.
-
Caching
-
Result caching strategy (keys, TTL), CDN considerations, and cache invalidation on model updates.
-
Asynchronous workflows
-
Queues/streams, idempotency, retries/DLQs for batch and overflow traffic.
-
Model lifecycle
-
Versioning, A/B testing or shadow traffic, rollout/rollback.
-
Monitoring and reliability
-
Metrics (latency percentiles, throughput, GPU utilization/memory, queue depth), drift detection, alerting SLOs, failure modes and fallbacks.
-
Offline training pipeline
-
Data labeling, augmentation, experiment tracking, evaluation, and packaging for serving.
-
Deployment and rollout
-
Blue/green or canary strategy; CI/CD and validation gates.
-
Cost/performance trade-offs
-
GPU types, batching, quantization, spot vs. on-demand, and multi-tenancy isolation.
-
Privacy, security, compliance
-
Data retention/deletion, encryption, access control, audit logging, regionalization, and acceptable use/content controls.
Provide diagrams if useful (ASCII is fine), capacity planning math, and a brief risk/mitigation section.