PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/TikTok

Design system to detect privacy-leak records

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to design scalable ML-driven systems for detecting and classifying privacy-sensitive and PII-containing records across structured and unstructured data, testing competencies in data engineering, machine learning (including deep learning and LLM/RAG integrations), system architecture, and privacy/security considerations. It is commonly asked to probe reasoning about functional and non-functional requirements, trade-offs in detection and classification approaches, evaluation metrics like precision and recall, feedback loops and operational scaling, and it falls under ML system design with a practical, application-level focus rather than purely conceptual abstraction.

  • medium
  • TikTok
  • ML System Design
  • Software Engineer

Design system to detect privacy-leak records

Company: TikTok

Role: Software Engineer

Category: ML System Design

Difficulty: medium

Interview Round: Technical Screen

You are given a very large database that contains user data (both structured fields and unstructured text such as logs, messages, and documents). The company wants to automatically: 1. **Identify records that may contain privacy-sensitive or PII (personally identifiable information)**, such as names, phone numbers, email addresses, or more subtle leaks (e.g., combinations of attributes that uniquely identify a person). 2. **Classify** these records by type and severity of privacy risk. You may use traditional ML, deep learning, and LLM-based approaches (e.g., retrieval-augmented generation, RAG). Design an end-to-end system that solves this problem. In your design, describe: - Functional and non-functional requirements. - High-level architecture and main components. - How you detect and classify privacy leaks (including any rule-based, ML, and LLM/RAG parts). - How the system scales to large datasets. - How you evaluate quality (precision/recall) and build a feedback loop. - Any privacy or security concerns in the detection pipeline itself. Assume the database could have billions of rows, with multiple data sources and schemas.

Quick Answer: This question evaluates a candidate's ability to design scalable ML-driven systems for detecting and classifying privacy-sensitive and PII-containing records across structured and unstructured data, testing competencies in data engineering, machine learning (including deep learning and LLM/RAG integrations), system architecture, and privacy/security considerations. It is commonly asked to probe reasoning about functional and non-functional requirements, trade-offs in detection and classification approaches, evaluation metrics like precision and recall, feedback loops and operational scaling, and it falls under ML system design with a practical, application-level focus rather than purely conceptual abstraction.

Related Interview Questions

  • Design video captioning under compute limits - TikTok (medium)
  • Design a model to choose dynamic K - TikTok (medium)
  • Design training for multimodal embedding model - TikTok (medium)
  • What skills are needed for AI infra roles? - TikTok (hard)
  • Design LLM-enhanced recommendation solutions - TikTok (hard)
TikTok logo
TikTok
Dec 8, 2025, 12:00 AM
Software Engineer
Technical Screen
ML System Design
2
0

You are given a very large database that contains user data (both structured fields and unstructured text such as logs, messages, and documents). The company wants to automatically:

  1. Identify records that may contain privacy-sensitive or PII (personally identifiable information) , such as names, phone numbers, email addresses, or more subtle leaks (e.g., combinations of attributes that uniquely identify a person).
  2. Classify these records by type and severity of privacy risk.

You may use traditional ML, deep learning, and LLM-based approaches (e.g., retrieval-augmented generation, RAG).

Design an end-to-end system that solves this problem. In your design, describe:

  • Functional and non-functional requirements.
  • High-level architecture and main components.
  • How you detect and classify privacy leaks (including any rule-based, ML, and LLM/RAG parts).
  • How the system scales to large datasets.
  • How you evaluate quality (precision/recall) and build a feedback loop.
  • Any privacy or security concerns in the detection pipeline itself.

Assume the database could have billions of rows, with multiple data sources and schemas.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More TikTok•More Software Engineer•TikTok Software Engineer•TikTok ML System Design•Software Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.