Python, Pandas, NumPy, And R Data Manipulation

What's being tested

This tests vectorized tabular manipulation in pandas, NumPy, and dplyr: create derived columns, join lookup tables, compute group aggregates, and run small simulations without row-by-row loops. Interviewers are probing whether you can write correct, scalable analysis code while handling missing values, type coercion, random sampling, and edge cases.

Patterns & templates

Vectorized conditionals — use np.select, np.where, case_when, or boolean masks; encode precedence explicitly from most-specific to least-specific condition.
Group-wise transforms — use df.groupby(keys)[col].transform('mean') to broadcast aggregates back to rows; in dplyr, use group_by() plus mutate().
Join then mutate — use left_join() / merge(..., how='left') to attach treatment parameters, then compute adjusted values; validate row counts after joins.
Random simulation — use sample_n, slice_sample, np.random.binomial, or np.random.default_rng; set seeds for reproducibility and avoid repeated loops when vectorization works.
Column normalization — compute column sums with axis=0, divide via broadcasting, and define behavior for zero-sum columns before coding.
Missing-value semantics — NaN comparisons are false in pandas; use isna(), notna(), fillna(), and nullable dtypes deliberately.
Complexity expectations — most solutions should be O(n) or O(n + k) time with linear memory; avoid apply(axis=1) unless data is tiny or logic is non-vectorizable.

Common pitfalls

Pitfall: Treating NaN == NaN as true or using normal comparisons on missing numeric fields; use isna() / notna() instead.

Pitfall: Creating many-to-many joins accidentally and inflating rows; check key uniqueness and compare pre/post row counts.

Pitfall: Normalizing by a zero column sum and returning inf or NaN unintentionally; specify whether to keep zeros, return NaN, or skip the column.

Practice these

The practice cards below cover the canonical variants — solve all of them and time yourself.

What's being tested

Patterns & templates

Common pitfalls

Practice these

Featured in interview prep guides

Practice questions

Related concepts