Design an image detection system

Q: Design an image detection system

This question evaluates system-design and machine learning engineering competencies, including scalable model serving, data and version management, inference performance, observability, reliability, privacy, and cost controls; the domain is ML system design and the level of abstraction is practical, engineering-focused application with conceptual trade-offs. It is commonly asked to determine an interviewee's ability to map functional and non-functional requirements into a robust architecture, reason about model and data lifecycles, and anticipate performance, reliability, and compliance challenges across synchronous and asynchronous workflows.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

System Design: End-to-End Image Object Detection Service

You are designing a production image object-detection service that ingests user images, runs detection models, and serves results via APIs. Assume this powers both synchronous user-facing requests and asynchronous batch/large uploads.

Provide a concrete, engineering-focused design that covers:

1) Requirements

Functional:
- Ingest images via REST APIs (sync and async).
- Run object detection and return bounding boxes, class labels, and confidences.
- Store results and expose retrieval APIs.
Non-functional (state your targets and rationale):
- Accuracy (e.g., mAP@0.5 / mAP@[0.5:0.95]).
- Latency (p50/p95 for sync API), throughput (QPS), and availability (SLA/SLO).
- Multi-tenant limits, quotas, and idempotency.

2) High-Level Architecture

Describe the components and data flow for:

API gateway, auth, rate limiting.
Image ingestion and storage.
Preprocessing (resize/normalize/format conversions).
Model serving (GPU inference), batching, and asynchronous workers/queues.
Result storage and retrieval.
Observability: metrics, logs, traces.

3) Data and Version Management

Dataset versioning, schema, and lineage.
Model registry and artifact/version rollout.
Backward/forward compatibility of APIs and stored results.

4) Performance Strategies

Batching strategies (dynamic vs fixed, max delay caps).
GPU utilization (concurrency, memory pinning, quantization, mixed precision).
Autoscaling policies (HPA based on queue depth/GPU metrics, cluster autoscaler).
Caching of results (image hash + model version), CDN considerations.

5) Modeling Choices

Single-stage (e.g., YOLO/RetinaNet) vs two-stage (e.g., Faster R-CNN): trade-offs and when to choose each.
Model export/serving format (ONNX/TensorRT/TorchScript) and optimizations.

6) Training and Labeling Pipeline

Labeling workflow (tooling, QC, consensus, active learning).
Data preprocessing/augmentation.
Training orchestration, hyperparameter tuning, and reproducibility.

7) Evaluation and Monitoring

Offline metrics: mAP, precision/recall, per-class breakdown, calibration.
Online metrics: latency, cost/request, drift, user feedback signals.
Alerting thresholds and dashboards.

8) Experimentation

A/B and canary strategies (traffic splits, sticky assignment, shadow mode).
Success metrics, guardrails, and rollback triggers.

9) Reliability and Backpressure

Failure modes (GPU node loss, queue spikes, bad inputs) and mitigations.
Timeouts, retries with backoff, circuit breakers.
Degradation strategies (lower-res inference, smaller model, async fallback).

10) Privacy, Compliance, and Security

Data retention, encryption (in transit/at rest), access controls, and audit.
Data residency and deletion (e.g., DSRs), PII handling.

11) Cost Controls

Right-sizing, autoscaling, spot capacity, and utilization targets.
Model compression (quantization/pruning), caching, and image downscaling.

12) Deployment

Blue/green and canary deployments for both models and serving infra.
Rollback mechanics and configuration/version pinning.

Be explicit about assumptions where needed and include small numeric examples (e.g., latency/throughput math) to justify design choices.