You are given an input tensor X with shape H×W×C = 64×64×3. Consider the following convolutional neural network (CNN):
(a) Compute the output shape H×W×C after each layer.
(b) Compute parameter counts and MACs for L1, L3, and L4. Compare L4’s MACs to a standard 3×3 conv with 64→128 channels (same input size as L4). Show formulas and numbers.
(c) Compute the receptive field size (in input pixels) of a single activation after L4.
(d) Where would you place BatchNorm relative to activation for stable training (pre‑act vs post‑act) and why?
(e) When would you prefer stride vs pooling vs dilation to preserve information while controlling compute?
(f) Explain the bias–variance and optimization trade‑offs of using depthwise separable convolutions in tiny‑model regimes.
Login required