PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/TikTok

Design multimodal deployment under compute limits

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competency in Machine Learning systems engineering for a Data Scientist role, covering multimodal model deployment and inference optimization, scalable retrieval (ANN and hybrid search), generalization and regularization concepts (overfitting, dropout), normalization methods, and RLHF, emphasizing both theoretical principles and engineering feasibility. It is commonly asked to assess reasoning about trade-offs between quality, latency, and cost in resource-constrained environments, validation via offline/online metrics, and the ability to bridge conceptual understanding with practical deployment and system-level design considerations.

  • easy
  • TikTok
  • Machine Learning
  • Data Scientist

Design multimodal deployment under compute limits

Company: TikTok

Role: Data Scientist

Category: Machine Learning

Difficulty: easy

Interview Round: Onsite

Round 1: Discuss how to deploy multimodal models under compute and GPU memory constraints. Follow-up: Given existing captions and embeddings, how to speed up video retrieval. What is overfitting and how to mitigate it. Coding: Implement MinStack that returns the minimum value in O(1) time. Round 2: Discuss methods to mitigate overfitting in deep learning and the principles behind Dropout. Compare different normalization methods and how to handle them during inference. Discuss the application of reinforcement learning in LLM post-training (RLHF). Coding: Implement MaxStack. Follow-up: How to compute the median in real-time from a data stream, and how to modify MaxStack to achieve this. Round 3: Explain Dropout again and why it maintains distribution consistency. Coding: Given a binary tree, determine whether there exists a path starting from any node, moving only upward, whose sum equals a target value.

Quick Answer: This question evaluates competency in Machine Learning systems engineering for a Data Scientist role, covering multimodal model deployment and inference optimization, scalable retrieval (ANN and hybrid search), generalization and regularization concepts (overfitting, dropout), normalization methods, and RLHF, emphasizing both theoretical principles and engineering feasibility. It is commonly asked to assess reasoning about trade-offs between quality, latency, and cost in resource-constrained environments, validation via offline/online metrics, and the ability to bridge conceptual understanding with practical deployment and system-level design considerations.

Related Interview Questions

  • Explain overfitting, dropout, normalization, RL post-training - TikTok (medium)
  • Write self-attention and cross-entropy pseudocode - TikTok (medium)
  • Answer ML fundamentals and diagnostics questions - TikTok (hard)
  • Implement AUC-ROC, softmax, and logistic regression - TikTok (medium)
  • Explain FlashAttention, KV cache, and RoPE - TikTok (medium)
TikTok logo
TikTok
Feb 17, 2026, 10:50 PM
Data Scientist
Onsite
Machine Learning
14
0

You need to answer a set of questions related to multimodal model deployment and post-training optimization in an interview. Provide systematic explanations based on engineering feasibility and ML principles (you may use bullet points or mini-frameworks).

1) How to Deploy Multimodal Models Under Compute and Memory Constraints

Assume you need to deploy a multimodal model (e.g., image-text/video-text retrieval or understanding model) in a resource-constrained environment (possibly a single mid-range GPU or edge device), with the goal of providing stable service at acceptable latency and cost.

Please explain:

  • How would you approach end-to-end inference optimization and system design (covering both model-side and system-side)?
  • What are common strategies for dealing with GPU memory bottlenecks vs. compute bottlenecks?
  • How do you make trade-offs between quality, latency, and cost , and what offline/online metrics and monitoring do you need for regression validation?

2) How to Speed Up Video Retrieval with Existing Captions and Embeddings

You have already generated the following offline for videos:

  • caption : text descriptions of videos or video segments
  • embedding : vectors for semantic retrieval (may include text/visual/multimodal vectors)

At query time, given a user query (primarily text), you need to return Top-K videos (or segments) with low latency and high throughput.

Please explain:

  • How to design a two-stage/multi-stage retrieval architecture for acceleration (e.g., candidate recall + fine ranking/re-ranking).
  • How to optimize on the vector retrieval side: ANN indexing, sharding, compression, caching , etc.
  • How to do hybrid retrieval combining captions and embeddings, and potential failure modes (e.g., semantic drift, popularity bias, insufficient long-tail recall).

3) What Is Overfitting? How to Mitigate It?

  • Define overfitting (from the perspectives of training/validation error, generalization, and model capacity).
  • Provide at least 5 categories of common mitigation techniques, and explain their applicable scenarios and side effects.

4) Dropout Principles and Inference-Time Handling

  • Explain what Dropout does during training.
  • Why is scaling needed (to maintain distribution/expectation consistency)?
  • How is Dropout handled during inference (and how does it differ from training)?

5) Compare Different Normalization Methods and Explain Inference-Time Handling

Compare and explain the core differences and applicable scenarios of at least the following normalization methods:

  • BatchNorm (BN)
  • LayerNorm (LN)
  • GroupNorm (GN) / RMSNorm (choose one or more)

Also answer:

  • What statistics/formulas does each use during inference?
  • What issues may arise with small batches, distribution shift, or mixed precision, and how to mitigate them?

6) Reinforcement Learning in LLM Post-Training (RLHF)

Outline the typical RLHF pipeline and key components:

  • Preference data and the Reward Model
  • Policy optimization (e.g., PPO-based methods) and KL constraints

Also discuss:

  • Benefits and common risks of RLHF (reward hacking, alignment tax, degeneration, etc.).
  • Possible alternatives (e.g., DPO/IPO, RLAIF, best-of-N/rejection sampling, etc.) and their trade-offs.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More TikTok•More Data Scientist•TikTok Data Scientist•TikTok Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.