Design logo infringement detection system

Q: Design logo infringement detection system

This question evaluates a candidate's competency in ML system design for visual search, covering image representation and embeddings, metric learning, large-scale retrieval and indexing, and low-latency inference within the Machine Learning Engineer domain.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

Loading...

Scenario

You work for a large e-commerce company. Brands register their official logos with you (e.g., Nike swoosh, Apple logo, etc.). Third-party sellers upload product images, and some may illegally use these brand logos (counterfeit goods, unauthorized resellers, etc.).

You are asked to design an ML-powered logo infringement detection system. The interviewer specifically wants a search-based solution: given reference logo images, the system should search over product images to find those that contain the logo or are visually very similar.

Requirements

Functional

Given a new product image, automatically flag whether it likely contains any protected brand logo .
Allow brand-protection teams to:
- Query: "Find all product images similar to this logo image." (image-to-image search)
- See ranked search results (most visually similar first).
Support multiple brands and potentially many logo variants per brand (different colors, orientations, backgrounds).

Non-functional

Scale to tens of millions of product images.
Low latency for online checks at upload time (e.g., < 500 ms per image for initial screening).
High recall (don’t miss infringements) while keeping false positives low enough that human reviewers can manage the queue.

Constraints and assumptions

Each product can have multiple images.
Logos may be small, rotated, partially occluded , or overlaid on complex backgrounds.
Adversarial sellers may modify logos slightly (color shifts, aspect ratio changes, adding noise, mirroring, etc.).

Tasks

High-level architecture
Describe the overall system architecture, including:
- Offline indexing pipeline for product images.
- Online inference pipeline when a new image is uploaded.
- How search is performed given a query logo image.
Representation & model choice
Explain how you will represent images and logos so that visually similar logos are close in some embedding space. Consider:
- Backbone architecture (e.g., CNN, Vision Transformer).
- Whether you do global image embeddings, local (patch/region) embeddings, or both.
- How you handle small logos within large images.
Search system
Given the embeddings, design a search subsystem that can:
- Index millions of product image embeddings.
- Quickly retrieve the top-K most similar images to a query logo embedding.
- Support incremental updates as new products are added.
Training strategy
Describe how you would train the model(s):
- What labeled data do you need (e.g., logo bounding boxes, positive/negative pairs)?
- How would you leverage metric learning (e.g., contrastive or triplet loss) for logo retrieval?
- How can you generate additional training data (e.g., synthetic logo overlays)?
Decision logic & thresholds
Once you retrieve candidate matches, how do you decide whether an image truly contains an infringing logo?
- How do you combine similarity scores, brand-specific thresholds, and possibly a second-stage classifier or detector?
- How would you handle different brands having different risk tolerances (e.g., very strict brands vs. more lenient ones)?
Evaluation & monitoring
Propose metrics and an evaluation strategy:
- How do you measure performance (precision, recall, ROC/PR curves) both for retrieval and final decisions?
- How do you monitor the system in production and collect feedback from human reviewers to improve the models?
Scaling & extensions
Discuss how you would:
- Scale the system as the catalog grows (index sharding, approximate nearest neighbor search, caching).
- Handle updates when new brands or new logo variants are registered.
- Deal with adversarial attacks and continuously evolving counterfeit techniques.

Your answer should walk through the end-to-end pipeline and clearly separate ML components (modeling, training, embeddings) from system components (storage, search index, services, monitoring).

Design logo infringement detection system

Scenario

Requirements

Constraints and assumptions

Tasks

Solution

Comments (0)

Design logo infringement detection system

Overview

Scenario

Requirements

Constraints and assumptions

Tasks

Solution

Comments (0)