PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/ML System Design/OpenAI

Design an image/video near-duplicate detection system

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competency in ML system design and large-scale multimedia retrieval, focusing on perceptual fingerprinting versus embedding strategies, scalable indexing and nearest-neighbor retrieval, and robustness to resizing, re-encoding, watermarks, minor edits, and adversarial manipulations.

  • hard
  • OpenAI
  • ML System Design
  • Machine Learning Engineer

Design an image/video near-duplicate detection system

Company: OpenAI

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Onsite

## Question Design a system to detect near-duplicate images/videos (e.g., reuploads, minor edits, different encodes) at large scale. ## Requirements - Support both images and videos. - Robust to resizing, cropping, re-encoding, watermarks, small edits. - High throughput ingestion; low-latency query for takedown/merge/dedup. - Handle billions of media items. ## Deliverables - Fingerprinting approach (perceptual hashing vs embeddings). - Indexing and retrieval architecture. - Thresholding, evaluation, and operational concerns (false positives, adversarial behavior).

Quick Answer: This question evaluates competency in ML system design and large-scale multimedia retrieval, focusing on perceptual fingerprinting versus embedding strategies, scalable indexing and nearest-neighbor retrieval, and robustness to resizing, re-encoding, watermarks, minor edits, and adversarial manipulations.

Related Interview Questions

  • Design a Text-to-Video Generation System - OpenAI (hard)
  • Design a Real-Time Sensor Intelligence System - OpenAI (medium)
  • Mine Novel Images from Unlabeled Data - OpenAI (medium)
  • Design a GPU-Efficient Video Service - OpenAI (medium)
  • How would you build an image classifier with dirty data? - OpenAI (easy)
OpenAI logo
OpenAI
Dec 15, 2025, 12:00 AM
Machine Learning Engineer
Onsite
ML System Design
7
0

Question

Design a system to detect near-duplicate images/videos (e.g., reuploads, minor edits, different encodes) at large scale.

Requirements

  • Support both images and videos.
  • Robust to resizing, cropping, re-encoding, watermarks, small edits.
  • High throughput ingestion; low-latency query for takedown/merge/dedup.
  • Handle billions of media items.

Deliverables

  • Fingerprinting approach (perceptual hashing vs embeddings).
  • Indexing and retrieval architecture.
  • Thresholding, evaluation, and operational concerns (false positives, adversarial behavior).

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More OpenAI•More Machine Learning Engineer•OpenAI Machine Learning Engineer•OpenAI ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.