PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/ML System Design/Datadog

Design an image detection system

Last updated: May 7, 2026

Quick Overview

This question evaluates a candidate's competence in designing scalable, reliable end-to-end machine learning systems for image object detection, including ingestion, preprocessing, model serving, data and version management, monitoring, and operational concerns.

  • hard
  • Datadog
  • ML System Design
  • Software Engineer

Design an image detection system

Company: Datadog

Role: Software Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Technical Screen

Design an end-to-end image object-detection system that ingests user images, runs detection models, and serves results via APIs. Specify functional/non-functional requirements (accuracy, latency, throughput, availability), high-level architecture (ingestion, storage, preprocessing, model serving, asynchronous workers/queues), data/version management, and how you would handle batching, GPU utilization, autoscaling, and caching. Describe model choices (single-stage vs. two-stage), training/labeling pipeline, evaluation metrics, online/offline monitoring, A/B testing strategy, failure modes/backpressure, privacy/compliance, cost controls, and rollback/blue-green deployments.

Quick Answer: This question evaluates a candidate's competence in designing scalable, reliable end-to-end machine learning systems for image object detection, including ingestion, preprocessing, model serving, data and version management, monitoring, and operational concerns.

Datadog logo
Datadog
Sep 6, 2025, 12:00 AM
Software Engineer
Technical Screen
ML System Design
22
0

System Design: End-to-End Image Object-Detection Service

Context

Design a production-grade service that ingests user-uploaded images, runs object detection models, and returns detections via APIs. Assume both synchronous (low-latency) and asynchronous (high-throughput) use cases. If you need concrete numbers to reason about trade-offs, you may assume a moderate scale (e.g., 1–5k RPS peak, average image ~1 MB, typical image sizes 640–2048 px on the long side), but state any assumptions you make.

Requirements

Specify the following:

  1. Functional requirements
  • Public APIs to submit images and retrieve detections
  • Synchronous detection for small/latency-sensitive requests
  • Asynchronous detection for large images/bulk loads
  • Idempotency, pagination, authentication/authorization
  • Result formats (bounding boxes, classes, confidences; optional masks)
  1. Non-functional requirements
  • Accuracy targets (e.g., mAP@0.5, mAP@[0.5:0.95])
  • Latency SLOs (p50/p95 for sync vs. async)
  • Throughput targets (RPS or jobs/sec)
  • Availability (e.g., 99.9%+), durability, cost constraints

High-Level Architecture

Describe at a high level:

  • Ingestion (upload endpoints, pre-signed URLs)
  • Storage (object store for images, DB for metadata/results)
  • Preprocessing pipeline (resize/normalize/EXIF/format conversion)
  • Model serving tier (GPU inference, batching)
  • Asynchronous workers and queues (with DLQs/backpressure)
  • APIs for submit/status/results
  • Observability (metrics/logs/traces)

Data/Version Management

  • Model registry, dataset versioning, schema evolution
  • Reproducible training and rollbacks (model and data)

Performance/Operations

  • Batching strategy, GPU utilization, concurrency
  • Autoscaling strategy (request- and queue-driven)
  • Caching strategies (results, model artifacts)

Modeling & ML Ops

  • Model choices: single-stage vs two-stage, and when to use each
  • Training and labeling pipeline (active learning, QA)
  • Evaluation metrics and validation gates
  • Online/offline monitoring (drift, quality, SLIs/SLOs)
  • A/B testing and rollout/guardrails

Reliability & Compliance

  • Failure modes, retries, backpressure, timeouts, circuit breakers
  • Privacy, compliance, data retention, regionality
  • Cost controls (GPU choice, right-sizing, spotting)
  • Deployment strategy (blue/green, canary, rollback)

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Datadog•More Software Engineer•Datadog Software Engineer•Datadog ML System Design•Software Engineer ML System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.