Question
Design a system to detect near-duplicate images/videos (e.g., reuploads, minor edits, different encodes) at large scale.
Requirements
-
Support both images and videos.
-
Robust to resizing, cropping, re-encoding, watermarks, small edits.
-
High throughput ingestion; low-latency query for takedown/merge/dedup.
-
Handle billions of media items.
Deliverables
-
Fingerprinting approach (perceptual hashing vs embeddings).
-
Indexing and retrieval architecture.
-
Thresholding, evaluation, and operational concerns (false positives, adversarial behavior).