Compare CNN, RNN, and LSTM rigorously
Company: Microsoft
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Onsite
Compare CNNs, RNNs, and LSTMs rigorously for sequence modeling.
Answer all parts:
1) Inductive biases and use-cases: When would you prefer a 1D dilated CNN over an RNN/LSTM for time series? When does an LSTM strictly dominate a vanilla RNN?
2) Vanishing/exploding gradients: Write the recurrence for a vanilla RNN hidden state and explain why gradients vanish/explode. Then write the LSTM gate equations (input, forget, output, cell) and explain how they mitigate the issue via additive paths and gating.
3) Parameter/computation comparison: For input of shape (batch=32, time=100, features=64), compute parameter counts for: (a) a 1D CNN with 128 filters, kernel size 3, stride 1, no bias sharing tricks; (b) a single-layer unidirectional GRU with 128 hidden units; (c) a single-layer unidirectional LSTM with 128 hidden units. Show formulas and totals. Comment on parallelism and latency implications.
4) Experimental design: You have only 50k labeled sequences and strict latency (<5 ms per sample). Propose an ablation plan to choose among the above models, including regularization, data augmentation, and early stopping criteria. Define primary metrics and stopping rules.
Quick Answer: This question evaluates sequence modeling competencies including comparative understanding of CNN, RNN, and LSTM inductive biases, gradient dynamics and gating mechanisms, parameter and computational trade-offs, and constrained experimental design within the Machine Learning domain focused on deep learning architectures for time-series.