Implement 2D convolution forward pass
Company: Tesla
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of 2D convolution mechanics and practical low-level tensor manipulation, including kernel shape handling, padding and stride effects, multi-channel input/output interactions, and optional bias integration.
Constraints
- 1 <= N, C_in, C_out and all spatial dimensions are small (interview-scale unit tests)
- Kernel may be non-square (K_h != K_w)
- Stride and padding may differ between height and width (s_h != s_w, p_h != p_w)
- Bias may be absent (empty list / None) or a list of length C_out
- Inputs are floating-point numbers; output must be a dense (N, C_out, H_out, W_out) tensor
- Padding is zero-padding; you must not use a built-in convolution operator
Examples
Input: ([[[[1.0, 2.0], [3.0, 4.0]]]], [[[[1.0, 0.0], [0.0, 1.0]]]], None, 1, 1, 0, 0)
Expected Output: [[[[5.0]]]]
Explanation: 1x1x2x2 input, identity-diagonal 2x2 kernel, no padding/stride 1 -> single output 1*1 + 4*1 = 5.
Input: ([[[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]]], [[[[1.0, 1.0], [1.0, 1.0]]]], [0.0], 1, 1, 0, 0)
Expected Output: [[[[12.0, 16.0], [24.0, 28.0]]]]
Explanation: 3x3 input, all-ones 2x2 kernel (box sum), bias 0 -> each output is the sum of a 2x2 window (e.g. top-left 1+2+4+5 = 12).
Input: ([[[[1.0, 1.0], [1.0, 1.0]]]], [[[[1.0]]]], [5.0], 1, 1, 1, 1)
Expected Output: [[[[5.0, 5.0, 5.0, 5.0], [5.0, 6.0, 6.0, 5.0], [5.0, 6.0, 6.0, 5.0], [5.0, 5.0, 5.0, 5.0]]]]
Explanation: 1x1 kernel with padding (1,1) grows a 2x2 input to 4x4; bias 5 appears everywhere, and the inner 2x2 (the real pixels) add an extra 1 to make 6.
Input: ([[[[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0], [13.0, 14.0, 15.0, 16.0]]]], [[[[1.0, 0.0], [0.0, 1.0]]]], None, 2, 2, 0, 0)
Expected Output: [[[[7.0, 11.0], [23.0, 27.0]]]]
Explanation: Stride (2,2) with a diagonal 2x2 kernel on a 4x4 input -> 2x2 output; top-left window picks 1 and 6 -> 7.
Input: ([[[[1.0, 2.0], [3.0, 4.0]], [[10.0, 20.0], [30.0, 40.0]]]], [[[[1.0, 1.0], [1.0, 1.0]], [[2.0, 2.0], [2.0, 2.0]]]], [1.0], 1, 1, 0, 0)
Expected Output: [[[[211.0]]]]
Explanation: Two input channels summed: channel0 box-sum (1+2+3+4)=10, channel1 weighted by 2 (10+20+30+40)*2=200, plus bias 1 = 211.
Input: ([[[[2.0, -1.0, 3.0], [0.0, 5.0, -2.0], [1.0, 1.0, 1.0]]]], [[[[1.0, -1.0, 1.0]]]], None, 1, 1, 0, 1)
Expected Output: [[[[-3.0, 6.0, -4.0], [5.0, -7.0, 7.0], [0.0, 1.0, 0.0]]]]
Explanation: Asymmetric padding p_h=0, p_w=1 with a 1x3 kernel [1,-1,1] and no bias; preserves a 3x3 output, exercising horizontal zero-padding and negative values.
Hints
- First compute the output dimensions: H_out = (H + 2*p_h - K_h) // s_h + 1 and W_out = (W + 2*p_w - K_w) // s_w + 1.
- You don't have to physically build a padded input. Map each output position (i, j) back to the source index row = i*s_h + u - p_h and col = j*s_w + v - p_w, and simply skip (treat as zero) any index that falls outside [0, H) or [0, W).
- Loop order n -> c_out -> i -> j, then accumulate over c_in -> u -> v. Initialize the accumulator with b[c_out] when bias is present, otherwise 0.
- Watch the edge cases the unit test targets: no bias, non-square kernels, and asymmetric stride/padding.