PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/TikTok

Design video captioning under compute limits

Last updated: Mar 29, 2026

Quick Overview

This question evaluates expertise in ML system design for multimodal large models, covering deployment under compute and GPU memory constraints as well as large-scale retrieval and processing of video captions and embeddings.

  • medium
  • TikTok
  • ML System Design
  • Machine Learning Engineer

Design video captioning under compute limits

Company: TikTok

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: medium

Interview Round: Technical Screen

## Scenario You are deploying a **multimodal large model** that generates **captions for videos**. ### Part A — Deployment under compute / VRAM constraints - The model takes video (frames/audio optional) and outputs a text caption. - You must **meet latency/throughput goals** while staying within tight **compute** and **GPU memory (VRAM)** limits. **Prompt:** Describe how you would design the end-to-end system (modeling + serving) to reliably deploy this capability under constrained compute/VRAM. ### Part B — Fast retrieval for brand ads + watermarking Assume you already have: - A caption for each video (possibly multiple captions per video) - An embedding vector per video (or per caption) A brand advertiser wants to **quickly find videos relevant to a query** (text and/or example creative) and then **apply a watermark** to matched videos. **Prompt:** How would you design and optimize the retrieval + processing pipeline to make this search and watermarking fast at scale? Include indexing, filtering/ranking, caching, and system trade-offs.

Quick Answer: This question evaluates expertise in ML system design for multimodal large models, covering deployment under compute and GPU memory constraints as well as large-scale retrieval and processing of video captions and embeddings.

Related Interview Questions

  • Design a model to choose dynamic K - TikTok (medium)
  • Design training for multimodal embedding model - TikTok (medium)
  • What skills are needed for AI infra roles? - TikTok (hard)
  • Design system to detect privacy-leak records - TikTok (medium)
  • Design LLM-enhanced recommendation solutions - TikTok (hard)
TikTok logo
TikTok
Feb 12, 2026, 12:00 AM
Machine Learning Engineer
Technical Screen
ML System Design
3
0

Scenario

You are deploying a multimodal large model that generates captions for videos.

Part A — Deployment under compute / VRAM constraints

  • The model takes video (frames/audio optional) and outputs a text caption.
  • You must meet latency/throughput goals while staying within tight compute and GPU memory (VRAM) limits.

Prompt: Describe how you would design the end-to-end system (modeling + serving) to reliably deploy this capability under constrained compute/VRAM.

Part B — Fast retrieval for brand ads + watermarking

Assume you already have:

  • A caption for each video (possibly multiple captions per video)
  • An embedding vector per video (or per caption)

A brand advertiser wants to quickly find videos relevant to a query (text and/or example creative) and then apply a watermark to matched videos.

Prompt: How would you design and optimize the retrieval + processing pipeline to make this search and watermarking fast at scale? Include indexing, filtering/ranking, caching, and system trade-offs.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More TikTok•More Machine Learning Engineer•TikTok Machine Learning Engineer•TikTok ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.