PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/Amazon

Design logo infringement detection system

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in ML system design for visual search, covering image representation and embeddings, metric learning, large-scale retrieval and indexing, and low-latency inference within the Machine Learning Engineer domain.

  • medium
  • Amazon
  • ML System Design
  • Machine Learning Engineer

Design logo infringement detection system

Company: Amazon

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: medium

Interview Round: Onsite

### Scenario You work for a large e-commerce company. Brands register their official logos with you (e.g., Nike swoosh, Apple logo, etc.). Third-party sellers upload product images, and some may illegally use these brand logos (counterfeit goods, unauthorized resellers, etc.). You are asked to design an **ML-powered logo infringement detection system**. The interviewer specifically wants a **search-based solution**: given reference logo images, the system should search over product images to find those that contain the logo or are visually very similar. ### Requirements **Functional** 1. Given a new product image, automatically flag whether it likely contains any **protected brand logo**. 2. Allow brand-protection teams to: - Query: "Find all product images similar to this logo image." (image-to-image search) - See ranked search results (most visually similar first). 3. Support **multiple brands** and potentially many logo variants per brand (different colors, orientations, backgrounds). **Non-functional** 1. Scale to **tens of millions** of product images. 2. Low latency for online checks at upload time (e.g., < 500 ms per image for initial screening). 3. High **recall** (don’t miss infringements) while keeping false positives low enough that human reviewers can manage the queue. ### Constraints and assumptions - Each product can have multiple images. - Logos may be **small, rotated, partially occluded**, or overlaid on complex backgrounds. - Adversarial sellers may modify logos slightly (color shifts, aspect ratio changes, adding noise, mirroring, etc.). ### Tasks 1. **High-level architecture** Describe the overall system architecture, including: - Offline indexing pipeline for product images. - Online inference pipeline when a new image is uploaded. - How search is performed given a query logo image. 2. **Representation & model choice** Explain how you will represent images and logos so that visually similar logos are close in some embedding space. Consider: - Backbone architecture (e.g., CNN, Vision Transformer). - Whether you do global image embeddings, local (patch/region) embeddings, or both. - How you handle small logos within large images. 3. **Search system** Given the embeddings, design a search subsystem that can: - Index millions of product image embeddings. - Quickly retrieve the top-K most similar images to a query logo embedding. - Support incremental updates as new products are added. 4. **Training strategy** Describe how you would **train** the model(s): - What labeled data do you need (e.g., logo bounding boxes, positive/negative pairs)? - How would you leverage metric learning (e.g., contrastive or triplet loss) for logo retrieval? - How can you generate additional training data (e.g., synthetic logo overlays)? 5. **Decision logic & thresholds** Once you retrieve candidate matches, how do you decide whether an image truly contains an infringing logo? - How do you combine similarity scores, brand-specific thresholds, and possibly a second-stage classifier or detector? - How would you handle different brands having different risk tolerances (e.g., very strict brands vs. more lenient ones)? 6. **Evaluation & monitoring** Propose metrics and an evaluation strategy: - How do you measure performance (precision, recall, ROC/PR curves) both for retrieval and final decisions? - How do you monitor the system in production and collect feedback from human reviewers to improve the models? 7. **Scaling & extensions** Discuss how you would: - Scale the system as the catalog grows (index sharding, approximate nearest neighbor search, caching). - Handle updates when new brands or new logo variants are registered. - Deal with adversarial attacks and continuously evolving counterfeit techniques. Your answer should walk through the end-to-end pipeline and clearly separate **ML components** (modeling, training, embeddings) from **system components** (storage, search index, services, monitoring).

Quick Answer: This question evaluates a candidate's competency in ML system design for visual search, covering image representation and embeddings, metric learning, large-scale retrieval and indexing, and low-latency inference within the Machine Learning Engineer domain.

Related Interview Questions

  • Design systems for global request detection and labeling - Amazon (hard)
  • Design a computer-use agent end-to-end - Amazon (medium)
  • Debug online worse than offline model performance - Amazon (medium)
  • Approach an ambiguous business problem - Amazon (medium)
  • Explain parallelism and collectives in training - Amazon (medium)
Amazon logo
Amazon
Nov 18, 2025, 12:00 AM
Machine Learning Engineer
Onsite
ML System Design
0
0
Loading...

Scenario

You work for a large e-commerce company. Brands register their official logos with you (e.g., Nike swoosh, Apple logo, etc.). Third-party sellers upload product images, and some may illegally use these brand logos (counterfeit goods, unauthorized resellers, etc.).

You are asked to design an ML-powered logo infringement detection system. The interviewer specifically wants a search-based solution: given reference logo images, the system should search over product images to find those that contain the logo or are visually very similar.

Requirements

Functional

  1. Given a new product image, automatically flag whether it likely contains any protected brand logo .
  2. Allow brand-protection teams to:
    • Query: "Find all product images similar to this logo image." (image-to-image search)
    • See ranked search results (most visually similar first).
  3. Support multiple brands and potentially many logo variants per brand (different colors, orientations, backgrounds).

Non-functional

  1. Scale to tens of millions of product images.
  2. Low latency for online checks at upload time (e.g., < 500 ms per image for initial screening).
  3. High recall (don’t miss infringements) while keeping false positives low enough that human reviewers can manage the queue.

Constraints and assumptions

  • Each product can have multiple images.
  • Logos may be small, rotated, partially occluded , or overlaid on complex backgrounds.
  • Adversarial sellers may modify logos slightly (color shifts, aspect ratio changes, adding noise, mirroring, etc.).

Tasks

  1. High-level architecture
    Describe the overall system architecture, including:
    • Offline indexing pipeline for product images.
    • Online inference pipeline when a new image is uploaded.
    • How search is performed given a query logo image.
  2. Representation & model choice
    Explain how you will represent images and logos so that visually similar logos are close in some embedding space. Consider:
    • Backbone architecture (e.g., CNN, Vision Transformer).
    • Whether you do global image embeddings, local (patch/region) embeddings, or both.
    • How you handle small logos within large images.
  3. Search system
    Given the embeddings, design a search subsystem that can:
    • Index millions of product image embeddings.
    • Quickly retrieve the top-K most similar images to a query logo embedding.
    • Support incremental updates as new products are added.
  4. Training strategy
    Describe how you would train the model(s):
    • What labeled data do you need (e.g., logo bounding boxes, positive/negative pairs)?
    • How would you leverage metric learning (e.g., contrastive or triplet loss) for logo retrieval?
    • How can you generate additional training data (e.g., synthetic logo overlays)?
  5. Decision logic & thresholds
    Once you retrieve candidate matches, how do you decide whether an image truly contains an infringing logo?
    • How do you combine similarity scores, brand-specific thresholds, and possibly a second-stage classifier or detector?
    • How would you handle different brands having different risk tolerances (e.g., very strict brands vs. more lenient ones)?
  6. Evaluation & monitoring
    Propose metrics and an evaluation strategy:
    • How do you measure performance (precision, recall, ROC/PR curves) both for retrieval and final decisions?
    • How do you monitor the system in production and collect feedback from human reviewers to improve the models?
  7. Scaling & extensions
    Discuss how you would:
    • Scale the system as the catalog grows (index sharding, approximate nearest neighbor search, caching).
    • Handle updates when new brands or new logo variants are registered.
    • Deal with adversarial attacks and continuously evolving counterfeit techniques.

Your answer should walk through the end-to-end pipeline and clearly separate ML components (modeling, training, embeddings) from system components (storage, search index, services, monitoring).

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Amazon•More Machine Learning Engineer•Amazon Machine Learning Engineer•Amazon ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.