PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/ML System Design/Amazon

Design logo infringement detection system

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in ML system design for visual search, covering image representation and embeddings, metric learning, large-scale retrieval and indexing, and low-latency inference within the Machine Learning Engineer domain.

  • medium
  • Amazon
  • ML System Design
  • Machine Learning Engineer

Design logo infringement detection system

Company: Amazon

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: medium

Interview Round: Onsite

### Scenario You work for a large e-commerce company. Brands register their official logos with you (e.g., Nike swoosh, Apple logo, etc.). Third-party sellers upload product images, and some may illegally use these brand logos (counterfeit goods, unauthorized resellers, etc.). You are asked to design an **ML-powered logo infringement detection system**. The interviewer specifically wants a **search-based solution**: given reference logo images, the system should search over product images to find those that contain the logo or are visually very similar. ### Requirements **Functional** 1. Given a new product image, automatically flag whether it likely contains any **protected brand logo**. 2. Allow brand-protection teams to: - Query: "Find all product images similar to this logo image." (image-to-image search) - See ranked search results (most visually similar first). 3. Support **multiple brands** and potentially many logo variants per brand (different colors, orientations, backgrounds). **Non-functional** 1. Scale to **tens of millions** of product images. 2. Low latency for online checks at upload time (e.g., < 500 ms per image for initial screening). 3. High **recall** (don’t miss infringements) while keeping false positives low enough that human reviewers can manage the queue. ### Constraints and assumptions - Each product can have multiple images. - Logos may be **small, rotated, partially occluded**, or overlaid on complex backgrounds. - Adversarial sellers may modify logos slightly (color shifts, aspect ratio changes, adding noise, mirroring, etc.). ### Tasks 1. **High-level architecture** Describe the overall system architecture, including: - Offline indexing pipeline for product images. - Online inference pipeline when a new image is uploaded. - How search is performed given a query logo image. 2. **Representation & model choice** Explain how you will represent images and logos so that visually similar logos are close in some embedding space. Consider: - Backbone architecture (e.g., CNN, Vision Transformer). - Whether you do global image embeddings, local (patch/region) embeddings, or both. - How you handle small logos within large images. 3. **Search system** Given the embeddings, design a search subsystem that can: - Index millions of product image embeddings. - Quickly retrieve the top-K most similar images to a query logo embedding. - Support incremental updates as new products are added. 4. **Training strategy** Describe how you would **train** the model(s): - What labeled data do you need (e.g., logo bounding boxes, positive/negative pairs)? - How would you leverage metric learning (e.g., contrastive or triplet loss) for logo retrieval? - How can you generate additional training data (e.g., synthetic logo overlays)? 5. **Decision logic & thresholds** Once you retrieve candidate matches, how do you decide whether an image truly contains an infringing logo? - How do you combine similarity scores, brand-specific thresholds, and possibly a second-stage classifier or detector? - How would you handle different brands having different risk tolerances (e.g., very strict brands vs. more lenient ones)? 6. **Evaluation & monitoring** Propose metrics and an evaluation strategy: - How do you measure performance (precision, recall, ROC/PR curves) both for retrieval and final decisions? - How do you monitor the system in production and collect feedback from human reviewers to improve the models? 7. **Scaling & extensions** Discuss how you would: - Scale the system as the catalog grows (index sharding, approximate nearest neighbor search, caching). - Handle updates when new brands or new logo variants are registered. - Deal with adversarial attacks and continuously evolving counterfeit techniques. Your answer should walk through the end-to-end pipeline and clearly separate **ML components** (modeling, training, embeddings) from **system components** (storage, search index, services, monitoring).

Quick Answer: This question evaluates a candidate's competency in ML system design for visual search, covering image representation and embeddings, metric learning, large-scale retrieval and indexing, and low-latency inference within the Machine Learning Engineer domain.

Related Interview Questions

  • Design systems for global request detection and labeling - Amazon (hard)
  • Design a computer-use agent end-to-end - Amazon (medium)
  • Debug online worse than offline model performance - Amazon (medium)
  • Approach an ambiguous business problem - Amazon (medium)
  • Explain parallelism and collectives in training - Amazon (medium)
|Home/ML System Design/Amazon

Design logo infringement detection system

Amazon logo
Amazon
Nov 18, 2025, 12:00 AM
mediumMachine Learning EngineerOnsiteML System Design
1
0
Loading...

Scenario

You work for a large e-commerce company. Brands register their official logos with you (e.g., Nike swoosh, Apple logo, etc.). Third-party sellers upload product images, and some may illegally use these brand logos (counterfeit goods, unauthorized resellers, etc.).

You are asked to design an ML-powered logo infringement detection system. The interviewer specifically wants a search-based solution: given reference logo images, the system should search over product images to find those that contain the logo or are visually very similar.

Requirements

Functional

  1. Given a new product image, automatically flag whether it likely contains any protected brand logo .
  2. Allow brand-protection teams to:
    • Query: "Find all product images similar to this logo image." (image-to-image search)
    • See ranked search results (most visually similar first).
  3. Support multiple brands and potentially many logo variants per brand (different colors, orientations, backgrounds).

Non-functional

  1. Scale to tens of millions of product images.
  2. Low latency for online checks at upload time (e.g., < 500 ms per image for initial screening).
  3. High recall (don’t miss infringements) while keeping false positives low enough that human reviewers can manage the queue.

Constraints and assumptions

  • Each product can have multiple images.
  • Logos may be small, rotated, partially occluded , or overlaid on complex backgrounds.
  • Adversarial sellers may modify logos slightly (color shifts, aspect ratio changes, adding noise, mirroring, etc.).

Tasks

  1. High-level architecture
    Describe the overall system architecture, including:
    • Offline indexing pipeline for product images.
    • Online inference pipeline when a new image is uploaded.
    • How search is performed given a query logo image.
  2. Representation & model choice
    Explain how you will represent images and logos so that visually similar logos are close in some embedding space. Consider:
    • Backbone architecture (e.g., CNN, Vision Transformer).
    • Whether you do global image embeddings, local (patch/region) embeddings, or both.
    • How you handle small logos within large images.
  3. Search system
    Given the embeddings, design a search subsystem that can:
    • Index millions of product image embeddings.
    • Quickly retrieve the top-K most similar images to a query logo embedding.
    • Support incremental updates as new products are added.
  4. Training strategy
    Describe how you would train the model(s):
    • What labeled data do you need (e.g., logo bounding boxes, positive/negative pairs)?
    • How would you leverage metric learning (e.g., contrastive or triplet loss) for logo retrieval?
    • How can you generate additional training data (e.g., synthetic logo overlays)?
  5. Decision logic & thresholds
    Once you retrieve candidate matches, how do you decide whether an image truly contains an infringing logo?
    • How do you combine similarity scores, brand-specific thresholds, and possibly a second-stage classifier or detector?
    • How would you handle different brands having different risk tolerances (e.g., very strict brands vs. more lenient ones)?
  6. Evaluation & monitoring
    Propose metrics and an evaluation strategy:
    • How do you measure performance (precision, recall, ROC/PR curves) both for retrieval and final decisions?
    • How do you monitor the system in production and collect feedback from human reviewers to improve the models?
  7. Scaling & extensions
    Discuss how you would:
    • Scale the system as the catalog grows (index sharding, approximate nearest neighbor search, caching).
    • Handle updates when new brands or new logo variants are registered.
    • Deal with adversarial attacks and continuously evolving counterfeit techniques.

Your answer should walk through the end-to-end pipeline and clearly separate ML components (modeling, training, embeddings) from system components (storage, search index, services, monitoring).

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Amazon•More Machine Learning Engineer•Amazon Machine Learning Engineer•Amazon ML System Design•Machine Learning Engineer ML System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.