PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Abnormal Security

Design a scalable photo deduplication service

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competence in large-scale system design, including distributed deduplication indexing, batch and incremental processing, concurrency correctness, safety and rollback mechanisms, reliability and observability, multi-tenant security, and cost-throughput capacity planning.

  • hard
  • Abnormal Security
  • System Design
  • Software Engineer

Design a scalable photo deduplication service

Company: Abnormal Security

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

How would you productionize the duplicate-photo removal algorithm for tens of millions of files across multiple machines and storage locations? Cover: choosing a distributed KV store (e.g., Redis) or alternatives for the dedupe index, key schema, memory sizing, and eviction; a batch pipeline (e.g., MapReduce/Spark) for scanning and hashing at scale, partitioning strategy and data locality; idempotency, exactly-once processing, and race conditions when workers overlap; verification before deletion, canarying, soft-delete/restore, and audit logs; fault tolerance, retries, backpressure, and monitoring (latency, throughput, error rates); incremental runs for newly added files and periodic re-hashing; security/permissions and multi-tenant isolation; and cost and throughput estimates.

Quick Answer: This question evaluates competence in large-scale system design, including distributed deduplication indexing, batch and incremental processing, concurrency correctness, safety and rollback mechanisms, reliability and observability, multi-tenant security, and cost-throughput capacity planning.

Abnormal Security logo
Abnormal Security
Jul 16, 2025, 12:00 AM
Software Engineer
Technical Screen
System Design
8
0

System Design: Productionizing Duplicate-Photo Removal at Scale

Context

Design a production system that detects and removes duplicate or near-duplicate photos across tens of millions of files stored in multiple storage locations (e.g., object stores, NAS, user devices) and processed by multiple machines. Assume batch and incremental runs, multi-tenant data, and strict safety/observability requirements.

Requirements

Cover the following decisions and trade-offs:

  1. Dedupe index
    • Choose a distributed KV/DB (e.g., Redis vs alternatives) for the dedupe index.
    • Key schema, memory sizing, and eviction strategy.
  2. Batch pipeline
    • Use MapReduce/Spark (or similar) to scan and hash at scale.
    • Partitioning strategy and data locality across storage locations/regions.
  3. Correctness under concurrency
    • Idempotency, exactly-once processing, and handling race conditions when workers overlap.
  4. Safety and rollout
    • Verification before deletion, canarying, soft-delete/restore workflow, and immutable audit logs.
  5. Reliability
    • Fault tolerance, retries, backpressure, and monitoring (latency, throughput, error rates).
  6. Incrementality
    • Incremental runs for newly added files and periodic re-hashing when algorithms change.
  7. Security and isolation
    • Permissions model and multi-tenant isolation.
  8. Cost and throughput
    • Provide rough cost and throughput estimates and capacity sizing assumptions.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Abnormal Security•More Software Engineer•Abnormal Security Software Engineer•Abnormal Security System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.