PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/TikTok

Implement AUC-ROC, softmax, and logistic regression

Last updated: Mar 29, 2026

Quick Overview

This question evaluates practical implementation skills for core machine learning components—computing AUC-ROC (including ROC point generation and handling ties or uniform labels), implementing a numerically stable softmax, and training logistic regression with cross-entropy loss and optional L2 regularization.

  • medium
  • TikTok
  • Machine Learning
  • Software Engineer

Implement AUC-ROC, softmax, and logistic regression

Company: TikTok

Role: Software Engineer

Category: Machine Learning

Difficulty: medium

Interview Round: Onsite

You are asked to implement a few core ML building blocks from scratch (no ML libraries such as scikit-learn). You may use basic numeric operations and standard data structures. ## Part A — AUC-ROC Given: - `y_true`: length `n` list/array of binary labels in `{0,1}` - `y_score`: length `n` list/array of real-valued prediction scores (higher means more likely positive) Task: 1. Compute the ROC curve points (TPR vs FPR) as the threshold varies. 2. Compute **AUC** (area under the ROC curve). Clarifications: - Handle ties in `y_score` correctly. - Define what you return when all labels are the same (all 0s or all 1s). ## Part B — Softmax Given a vector of logits `z = [z1, z2, ..., zk]`, implement softmax: \[ \mathrm{softmax}(z)_i = \frac{e^{z_i}}{\sum_{j=1}^k e^{z_j}} \] Task: - Return a probability vector of length `k`. - Make your implementation numerically stable. ## Part C — Logistic Regression Given: - Feature matrix `X` of shape `(n, d)` - Binary labels `y` of shape `(n,)` in `{0,1}` Task: 1. Implement logistic regression training using gradient descent (batch or mini-batch). 2. Specify the loss you optimize (cross-entropy / negative log-likelihood). 3. Optionally include L2 regularization and explain how it changes the gradient. 4. Return learned parameters and a `predict_proba` / `predict` function. Constraints/expectations: - Discuss time complexity per training epoch. - Mention common pitfalls (learning rate, feature scaling, overflow, class imbalance).

Quick Answer: This question evaluates practical implementation skills for core machine learning components—computing AUC-ROC (including ROC point generation and handling ties or uniform labels), implementing a numerically stable softmax, and training logistic regression with cross-entropy loss and optional L2 regularization.

Related Interview Questions

  • Design multimodal deployment under compute limits - TikTok (easy)
  • Explain overfitting, dropout, normalization, RL post-training - TikTok (medium)
  • Write self-attention and cross-entropy pseudocode - TikTok (medium)
  • Answer ML fundamentals and diagnostics questions - TikTok (hard)
  • Explain FlashAttention, KV cache, and RoPE - TikTok (medium)
TikTok logo
TikTok
Jan 22, 2026, 12:00 AM
Software Engineer
Onsite
Machine Learning
3
0
Loading...

You are asked to implement a few core ML building blocks from scratch (no ML libraries such as scikit-learn). You may use basic numeric operations and standard data structures.

Part A — AUC-ROC

Given:

  • y_true : length n list/array of binary labels in {0,1}
  • y_score : length n list/array of real-valued prediction scores (higher means more likely positive)

Task:

  1. Compute the ROC curve points (TPR vs FPR) as the threshold varies.
  2. Compute AUC (area under the ROC curve).

Clarifications:

  • Handle ties in y_score correctly.
  • Define what you return when all labels are the same (all 0s or all 1s).

Part B — Softmax

Given a vector of logits z = [z1, z2, ..., zk], implement softmax:

softmax(z)i=ezi∑j=1kezj\mathrm{softmax}(z)_i = \frac{e^{z_i}}{\sum_{j=1}^k e^{z_j}}softmax(z)i​=∑j=1k​ezj​ezi​​

Task:

  • Return a probability vector of length k .
  • Make your implementation numerically stable.

Part C — Logistic Regression

Given:

  • Feature matrix X of shape (n, d)
  • Binary labels y of shape (n,) in {0,1}

Task:

  1. Implement logistic regression training using gradient descent (batch or mini-batch).
  2. Specify the loss you optimize (cross-entropy / negative log-likelihood).
  3. Optionally include L2 regularization and explain how it changes the gradient.
  4. Return learned parameters and a predict_proba / predict function.

Constraints/expectations:

  • Discuss time complexity per training epoch.
  • Mention common pitfalls (learning rate, feature scaling, overflow, class imbalance).

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More TikTok•More Software Engineer•TikTok Software Engineer•TikTok Machine Learning•Software Engineer Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.