PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Point72

Explain Transformer Encoder and Decoder Behavior

Last updated: May 2, 2026

Quick Overview

This question evaluates knowledge of Transformer architecture and generative language modeling, specifically the encoder versus decoder roles, attention patterns and causal masking, plus mechanisms behind stochastic text generation such as sampling and temperature.

  • medium
  • Point72
  • Machine Learning
  • Machine Learning Engineer

Explain Transformer Encoder and Decoder Behavior

Company: Point72

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: medium

Interview Round: Technical Screen

Answer the following Transformer fundamentals questions in a machine learning interview: 1. What are the main differences between a Transformer encoder and a Transformer decoder? Your answer should discuss attention patterns and the role of a causal mask. 2. For the same input prompt, why can an autoregressive decoder-based language model produce different outputs on different runs? Discuss sampling, temperature, and when generation would be deterministic.

Quick Answer: This question evaluates knowledge of Transformer architecture and generative language modeling, specifically the encoder versus decoder roles, attention patterns and causal masking, plus mechanisms behind stochastic text generation such as sampling and temperature.

Related Interview Questions

  • Design Features for Residual Volatility - Point72 (medium)
  • Compute Gaussian Probability and Regression Coefficients - Point72 (medium)
  • Design a News-Filtering Prompt - Point72 (medium)
  • Explain project details, PCA, and SHAP - Point72 (easy)
  • How would you explain PCA and SHAP? - Point72 (hard)
Point72 logo
Point72
Mar 6, 2026, 12:00 AM
Machine Learning Engineer
Technical Screen
Machine Learning
1
0

Answer the following Transformer fundamentals questions in a machine learning interview:

  1. What are the main differences between a Transformer encoder and a Transformer decoder? Your answer should discuss attention patterns and the role of a causal mask.
  2. For the same input prompt, why can an autoregressive decoder-based language model produce different outputs on different runs? Discuss sampling, temperature, and when generation would be deterministic.

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Point72•More Machine Learning Engineer•Point72 Machine Learning Engineer•Point72 Machine Learning•Machine Learning Engineer Machine Learning
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.