Problem
Implement the forward pass of a 2D convolution (conv2d) from scratch (no deep learning libraries).
You are given:
-
Input tensor
x
with shape
(N, C_in, H, W)
(NCHW layout)
-
Filter weights
w
with shape
(C_out, C_in, K_h, K_w)
-
Optional bias
b
with shape
(C_out,)
(may be
None
)
-
Integer stride
s_h, s_w
-
Integer padding
p_h, p_w
(zero-padding applied to height/width)
Compute the output tensor y with shape (N, C_out, H_out, W_out) where:
Hout=⌊shH+2ph−Kh⌋+1,Wout=⌊swW+2pw−Kw⌋+1
For each output element:
y[n,cout,i,j]=∑cin=0Cin−1∑u=0Kh−1∑v=0Kw−1xpad[n,cin,i⋅sh+u,j⋅sw+v]⋅w[cout,cin,u,v]+b[cout]
where x_pad is x padded with zeros by (p_h, p_w).
Requirements
-
Return
y
as a dense numeric array/tensor.
-
Do not use existing convolution operators.
-
Handle edge cases such as
b is None
, non-square kernels, and different strides/padding.
Constraints (typical for an interview unit test)
-
Sizes are small enough that an
O(N⋅Cout⋅Cin⋅Hout⋅Wout⋅Kh⋅Kw)
implementation passes.
-
Inputs are floating point numbers.
Example (shape check)
If x is (1, 3, 32, 32), w is (8, 3, 3, 3), stride (1,1), padding (1,1), then output shape is (1, 8, 32, 32).