Python/Pandas Data Manipulation

What's being tested

This tests analysis-grade data manipulation in pandas and SQL: cleaning messy inputs, joining heterogeneous tables, aggregating by time/customer/product segments, and ranking or deduplicating records correctly. Interviewers are probing whether you can produce trustworthy metric tables under realistic ambiguity: duplicate events, currency normalization, date granularity, missing values, and tie-breaking.

Patterns & templates

Groupby aggregation in pandas: df.groupby(keys).agg(...) for revenue, counts, kWh, or sales totals; validate row grain before aggregating.
Time bucketing with pd.to_datetime, .dt.date, .dt.to_period("M"), or SQL DATE_TRUNC; avoid mixing timestamps and dates accidentally.
Deduplication by business key using drop_duplicates(subset=..., keep=...) or ROW_NUMBER() OVER (...); declare deterministic tie-breakers.
Join then normalize pattern: merge facts to lookup tables like exchange rates using merge; check many-to-one assumptions before computing converted metrics.
Ranking within groups via rank, sort_values, cumcount, or SQL DENSE_RANK; specify whether ties should share rank.
Conditional segmentation using np.where, pd.cut, CASE WHEN, and boolean masks for Prime/non-Prime, price buckets, or customer cohorts.
Streaming/counting basics for text-like inputs: use collections.Counter or plain dict; Unicode normalization with unicodedata.normalize and regex tokenization.

Common pitfalls

Pitfall: Aggregating before fixing grain. If order lines are duplicated or salaries repeat by country/date, totals and ranks become silently wrong.

Pitfall: Treating date joins as exact timestamp joins. Exchange rates, sales days, and meter readings often require explicit date extraction or as-of logic.

Pitfall: Returning code without explaining assumptions. Say how you handle nulls, duplicates, ties, currencies, and timezone/date boundaries.

Practice these

The practice cards below cover the canonical variants — solve all of them and time yourself.

What's being tested

Patterns & templates

Common pitfalls

Practice these

Featured in interview prep guides

Practice questions

Related concepts