PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/TikTok

Write self-attention and cross-entropy pseudocode

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of Transformer internals—specifically scaled dot-product self-attention (including Q/K/V projections and attention masks), multi-class cross-entropy loss computation, and conceptual roles like the position-wise feed-forward network and attention score scaling—in the Machine Learning / deep learning domain.

  • medium
  • TikTok
  • Machine Learning
  • Machine Learning Engineer

Write self-attention and cross-entropy pseudocode

Company: TikTok

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: medium

Interview Round: Onsite

You are asked to explain core Transformer / deep learning components. ## Part A — Self-attention pseudocode Write clear pseudocode (not full code) for **scaled dot-product self-attention** for a single attention head. Your pseudocode should include: - Inputs/outputs and tensor shapes (batch size `B`, sequence length `T`, model dim `d_model`, head dim `d_k`) - Computing `Q, K, V` via linear projections - Computing attention logits and applying scaling - Softmax and weighted sum - (Optional but recommended) Handling an attention mask (padding mask or causal mask) ## Part B — Cross-entropy pseudocode Write pseudocode for **multi-class cross-entropy loss** for a batch of examples, given: - Model logits `z` of shape `[B, C]` - Ground-truth labels either as class indices `[B]` or one-hot `[B, C]` - Return a scalar loss (mean over batch) ## Part C — Concept questions 1. In a Transformer block, what is the role of the **position-wise feed-forward network (FFN)** relative to attention? Why is it needed? 2. Why do we scale the dot-product attention scores by `1 / sqrt(d_k)` before applying softmax? What problem does it address?

Quick Answer: This question evaluates understanding of Transformer internals—specifically scaled dot-product self-attention (including Q/K/V projections and attention masks), multi-class cross-entropy loss computation, and conceptual roles like the position-wise feed-forward network and attention score scaling—in the Machine Learning / deep learning domain.

Related Interview Questions

  • Design multimodal deployment under compute limits - TikTok (easy)
  • Explain overfitting, dropout, normalization, RL post-training - TikTok (medium)
  • Answer ML fundamentals and diagnostics questions - TikTok (hard)
  • Implement AUC-ROC, softmax, and logistic regression - TikTok (medium)
  • Explain FlashAttention, KV cache, and RoPE - TikTok (medium)
|Home/Machine Learning/TikTok

Write self-attention and cross-entropy pseudocode

TikTok logo
TikTok
Feb 12, 2026, 12:00 AM
mediumMachine Learning EngineerOnsiteMachine Learning
10
0

You are asked to explain core Transformer / deep learning components.

Part A — Self-attention pseudocode

Write clear pseudocode (not full code) for scaled dot-product self-attention for a single attention head. Your pseudocode should include:

  • Inputs/outputs and tensor shapes (batch size B , sequence length T , model dim d_model , head dim d_k )
  • Computing Q, K, V via linear projections
  • Computing attention logits and applying scaling
  • Softmax and weighted sum
  • (Optional but recommended) Handling an attention mask (padding mask or causal mask)

Part B — Cross-entropy pseudocode

Write pseudocode for multi-class cross-entropy loss for a batch of examples, given:

  • Model logits z of shape [B, C]
  • Ground-truth labels either as class indices [B] or one-hot [B, C]
  • Return a scalar loss (mean over batch)

Part C — Concept questions

  1. In a Transformer block, what is the role of the position-wise feed-forward network (FFN) relative to attention? Why is it needed?
  2. Why do we scale the dot-product attention scores by 1 / sqrt(d_k) before applying softmax? What problem does it address?
Loading comments...

Browse More Questions

More Machine Learning•More TikTok•More Machine Learning Engineer•TikTok Machine Learning Engineer•TikTok Machine Learning•Machine Learning Engineer Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.