PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Microsoft

Compare CNN, RNN, and LSTM rigorously

Last updated: Mar 29, 2026

Quick Overview

This question evaluates sequence modeling competencies including comparative understanding of CNN, RNN, and LSTM inductive biases, gradient dynamics and gating mechanisms, parameter and computational trade-offs, and constrained experimental design within the Machine Learning domain focused on deep learning architectures for time-series.

  • hard
  • Microsoft
  • Machine Learning
  • Data Scientist

Compare CNN, RNN, and LSTM rigorously

Company: Microsoft

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

Compare CNNs, RNNs, and LSTMs rigorously for sequence modeling. Answer all parts: 1) Inductive biases and use-cases: When would you prefer a 1D dilated CNN over an RNN/LSTM for time series? When does an LSTM strictly dominate a vanilla RNN? 2) Vanishing/exploding gradients: Write the recurrence for a vanilla RNN hidden state and explain why gradients vanish/explode. Then write the LSTM gate equations (input, forget, output, cell) and explain how they mitigate the issue via additive paths and gating. 3) Parameter/computation comparison: For input of shape (batch=32, time=100, features=64), compute parameter counts for: (a) a 1D CNN with 128 filters, kernel size 3, stride 1, no bias sharing tricks; (b) a single-layer unidirectional GRU with 128 hidden units; (c) a single-layer unidirectional LSTM with 128 hidden units. Show formulas and totals. Comment on parallelism and latency implications. 4) Experimental design: You have only 50k labeled sequences and strict latency (<5 ms per sample). Propose an ablation plan to choose among the above models, including regularization, data augmentation, and early stopping criteria. Define primary metrics and stopping rules.

Quick Answer: This question evaluates sequence modeling competencies including comparative understanding of CNN, RNN, and LSTM inductive biases, gradient dynamics and gating mechanisms, parameter and computational trade-offs, and constrained experimental design within the Machine Learning domain focused on deep learning architectures for time-series.

Related Interview Questions

  • How do you choose a model? - Microsoft (medium)
  • Explain SHAP in an ML System - Microsoft (medium)
  • Explain normalization, regularization, CTR, imbalance handling - Microsoft (medium)
  • Clean OCR data and build an LLM dataset - Microsoft (medium)
  • Explain SHAP and build an ML project - Microsoft (easy)
Microsoft logo
Microsoft
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
2
0

Sequence Modeling: Rigorous Comparison of CNNs, RNNs, and LSTMs

Context and assumptions:

  • We are modeling 1D sequences of shape (batch=32, time=100, features=64).
  • Unless stated otherwise: vanilla RNN uses tanh activation; LSTM/GRU use standard gate definitions; biases are included; parameter counts assume a single bias per gate (we will note the common two-bias variant for completeness).

Answer all parts:

  1. Inductive biases and use-cases
  • When would you prefer a 1D dilated CNN over an RNN/LSTM for time-series tasks?
  • When does an LSTM strictly dominate a vanilla RNN in practice?
  1. Vanishing/exploding gradients
  • Write the vanilla RNN hidden-state recurrence and explain why gradients vanish or explode.
  • Write the LSTM gate equations (input, forget, output, cell) and explain how additive paths and gating mitigate the issue.
  1. Parameter/computation comparison
  • For input shape (batch=32, time=100, features=64), compute parameter counts for: a) 1D CNN with 128 filters, kernel size 3, stride 1 (standard conv; per-filter bias). b) Single-layer unidirectional GRU with 128 hidden units. c) Single-layer unidirectional LSTM with 128 hidden units.
  • Show formulas and totals. Comment on parallelism and latency implications.
  1. Experimental design under constraints
  • You have 50k labeled sequences and a strict latency budget (<5 ms per sample). Propose an ablation plan to choose among the above models, including regularization, data augmentation, and early stopping. Define primary metrics and stopping rules.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Microsoft•More Data Scientist•Microsoft Data Scientist•Microsoft Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.