Normalize Columns in Binomial Matrix Efficiently
Company: Google
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
##### Scenario
Write code that creates a 100×100 matrix of Binomial(1, 0.
5) samples and normalizes each column so it sums to 1.
##### Question
Provide an efficient algorithm (language of your choice) to perform the above task in O(n²) time.
##### Hints
Vectorized NumPy operations beat nested loops.
Quick Answer: This question evaluates proficiency in matrix manipulation, handling of binomial random samples, column normalization, and algorithmic efficiency for processing large numerical arrays.
Given an n x m binary matrix (entries are 0 or 1), return a new matrix of floats where each column is normalized to sum to 1. For each column j with sum s > 0, set output[i][j] = matrix[i][j] / s for all rows i. If a column sum is 0, leave that column as all zeros. Preserve the original shape and row/column order.
Constraints
- 1 <= n, m <= 1000
- matrix is rectangular with all rows of equal length
- Each entry matrix[i][j] is 0 or 1
- Return a new matrix of floats
- Time complexity should be O(n*m) (O(n^2) for square matrices)
- If a column sum is 0, the entire column remains zeros
Solution
from typing import List
def normalize_columns(matrix: List[List[int]]) -> List[List[float]]:
n = len(matrix)
if n == 0:
return []
m = len(matrix[0])
# Compute column sums
col_sums = [0] * m
for i in range(n):
row = matrix[i]
if len(row) != m:
raise ValueError("All rows must have the same length")
for j in range(m):
col_sums[j] += row[j]
# Prepare result matrix of zeros
result: List[List[float]] = [[0.0] * m for _ in range(n)]
# Fill normalized values
for j in range(m):
s = col_sums[j]
if s > 0:
inv = 1.0 / s
for i in range(n):
if matrix[i][j] != 0:
result[i][j] = matrix[i][j] * inv
# else leave the column as zeros
return result
Explanation
The algorithm performs two passes: first accumulate sums for each column, then divide each nonzero entry by its column sum to produce normalized values. Columns with sum zero are left as zeros to avoid division by zero. This avoids repeated summation and ensures linear work in the number of elements.
Time complexity: O(n*m) (O(n^2) for n x n matrices). Space complexity: O(n*m) for the output matrix (O(1) extra if normalizing in place).
Hints
- Compute all column sums in a single pass over the matrix.
- Then divide entries by their column sum to fill the output.
- If a column sum is zero, skip dividing that column to avoid division by zero.
- Avoid recomputing column sums per row or per element.