Conv2D Forward Pass, Vectorization, and Parameter Counts
Asked of: Machine Learning Engineer
Last updated
What's being tested
Candidates must show they understand low-level Conv2D mechanics (multi-channel dot-products, stride, padding) and can turn a looped implementation into an efficient vectorized NumPy implementation. Interviewers probe correct output-shape math, memory/time tradeoffs from unfolding (im2col), and the simple algebra for parameter counts.
Patterns & templates
-
im2col/ unfold — reshape sliding windows into (N * H_out * W_out, K_hK_wC_in) then matrix-multiply with reshaped filters. -
Filter reshape — turn filters to (C_out, K_hK_wC_in) and use
np.dot/np.tensordot/np.einsumfor fast contraction. -
Output size formula — (same for width); validate integers.
-
Bias handling — broadcast a (C_out,) bias across spatial dims after conv using broadcasting rules.
-
Vectorized idiom — avoid Python loops over spatial positions; aim for one big GEMM per batch. Complexity becomes dominated by matrix multiply.
-
Memory tradeoff —
im2colincreases memory by factor K_h*K_w; for large kernels prefernp.einsumwith smaller intermediate views or batched GEMMs. -
Edge cases — kernel larger than input, zero padding, non-unit stride, uneven division; test shapes with asserts.
-
Data types & perf — prefer
float32for GPU parity;float64doubles memory and slows BLAS calls.
Common pitfalls
Pitfall: Miscomputing output spatial dimensions — forgetting floor division or off-by-one when padding/stride combination doesn't tile exactly.
Pitfall: Channel ordering mix-up — confusing (N, H, W, C) vs (N, C, H, W) causes silent shape bugs; assert ordering up-front.
Pitfall: Memory blow-up from naive
im2colon large batches/kernels — state the O(N * H_out * W_out * K_hK_wC_in) memory and offer streamed/batched alternatives.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions
Related concepts
- Transformer Self-Attention and BackpropagationMachine Learning
- GPU Programming, Graphics APIs, And Shader CompilersSystem Design
- Transformer Architectures And AttentionMachine Learning
- ML Fundamentals: Backprop, Attention, And RLMachine Learning
- Algorithms, Data Structures, And Complexity AnalysisCoding & Algorithms
- ML Frameworks, Model Compilation, And ParallelismML System Design