PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/TikTok

Write self-attention and cross-entropy pseudocode

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of Transformer internals—specifically scaled dot-product self-attention (including Q/K/V projections and attention masks), multi-class cross-entropy loss computation, and conceptual roles like the position-wise feed-forward network and attention score scaling—in the Machine Learning / deep learning domain.

  • medium
  • TikTok
  • Machine Learning
  • Machine Learning Engineer

Write self-attention and cross-entropy pseudocode

Company: TikTok

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: medium

Interview Round: Onsite

You are asked to explain core Transformer / deep learning components. ## Part A — Self-attention pseudocode Write clear pseudocode (not full code) for **scaled dot-product self-attention** for a single attention head. Your pseudocode should include: - Inputs/outputs and tensor shapes (batch size `B`, sequence length `T`, model dim `d_model`, head dim `d_k`) - Computing `Q, K, V` via linear projections - Computing attention logits and applying scaling - Softmax and weighted sum - (Optional but recommended) Handling an attention mask (padding mask or causal mask) ## Part B — Cross-entropy pseudocode Write pseudocode for **multi-class cross-entropy loss** for a batch of examples, given: - Model logits `z` of shape `[B, C]` - Ground-truth labels either as class indices `[B]` or one-hot `[B, C]` - Return a scalar loss (mean over batch) ## Part C — Concept questions 1. In a Transformer block, what is the role of the **position-wise feed-forward network (FFN)** relative to attention? Why is it needed? 2. Why do we scale the dot-product attention scores by `1 / sqrt(d_k)` before applying softmax? What problem does it address?

Quick Answer: This question evaluates understanding of Transformer internals—specifically scaled dot-product self-attention (including Q/K/V projections and attention masks), multi-class cross-entropy loss computation, and conceptual roles like the position-wise feed-forward network and attention score scaling—in the Machine Learning / deep learning domain.

Related Interview Questions

  • Design multimodal deployment under compute limits - TikTok (easy)
  • Explain overfitting, dropout, normalization, RL post-training - TikTok (medium)
  • Implement AUC-ROC, softmax, and logistic regression - TikTok (medium)
  • Answer ML fundamentals and diagnostics questions - TikTok (hard)
  • Explain FlashAttention, KV cache, and RoPE - TikTok (medium)
TikTok logo
TikTok
Feb 12, 2026, 12:00 AM
Machine Learning Engineer
Onsite
Machine Learning
7
0

You are asked to explain core Transformer / deep learning components.

Part A — Self-attention pseudocode

Write clear pseudocode (not full code) for scaled dot-product self-attention for a single attention head. Your pseudocode should include:

  • Inputs/outputs and tensor shapes (batch size B , sequence length T , model dim d_model , head dim d_k )
  • Computing Q, K, V via linear projections
  • Computing attention logits and applying scaling
  • Softmax and weighted sum
  • (Optional but recommended) Handling an attention mask (padding mask or causal mask)

Part B — Cross-entropy pseudocode

Write pseudocode for multi-class cross-entropy loss for a batch of examples, given:

  • Model logits z of shape [B, C]
  • Ground-truth labels either as class indices [B] or one-hot [B, C]
  • Return a scalar loss (mean over batch)

Part C — Concept questions

  1. In a Transformer block, what is the role of the position-wise feed-forward network (FFN) relative to attention? Why is it needed?
  2. Why do we scale the dot-product attention scores by 1 / sqrt(d_k) before applying softmax? What problem does it address?

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More TikTok•More Machine Learning Engineer•TikTok Machine Learning Engineer•TikTok Machine Learning•Machine Learning Engineer Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.