Explain deployment, retrieval, and regularization
Company: Bytedance
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Onsite
You are interviewing for a machine-learning role at a large-scale short-video platform. Answer the following conceptual questions.
1. Under tight GPU compute and VRAM constraints, how would you deploy a multimodal model for tasks such as video retrieval or ranking? Discuss model architecture choices, compression, batching, caching, and how you would trade off quality, p99 latency, throughput, and serving cost.
2. Suppose captions and video embeddings have already been precomputed and stored. How would you accelerate online video retrieval? Discuss indexing, approximate nearest neighbor search, hybrid text-plus-vector retrieval, reranking, memory footprint, and freshness.
3. What is overfitting, how would you detect it, and how would you mitigate it in deep learning systems?
4. Explain the intuition and mathematics of Dropout, including why inverted Dropout keeps the expected activation scale consistent between training and inference.
5. Compare common normalization methods such as BatchNorm, LayerNorm, GroupNorm, and RMSNorm. When is each appropriate, and how are statistics handled at inference time?
6. Explain how reinforcement learning is used in LLM post-training, especially RLHF. Describe the roles of supervised fine-tuning, preference data, reward modeling, policy optimization, KL regularization, and common failure modes.
Quick Answer: This question evaluates competencies in machine learning systems engineering, covering multimodal model deployment under GPU/VRAM constraints, scalable video retrieval and indexing, regularization and normalization methods, and reinforcement learning–based post-training for language models.