Pandas Data Wrangling
Asked of: Data Scientist
Last updated

What's being tested
These exercises test pandas data wrangling for product analytics: cleaning event/transaction data, deduplicating records, joining lookup tables, and producing grouped metrics. Interviewers are probing whether you can translate ambiguous metric logic into reliable pandas code that handles nulls, ties, timestamps, and category mappings without overengineering.
Patterns & templates
-
Filter-then-aggregate — use
df.loc[mask]beforegroupby; exclude null, zero, or negative values before computingmean,sum, or rates. -
Lookup enrichment — join IDs to readable labels with
merge,map, or dictionary lookup; validate unmatched IDs withisna().mean(). -
Deduplication before metrics — use
drop_duplicates(subset=[...])orsort_values(...).drop_duplicates(..., keep='last'); wrong grain creates inflated counts. -
Grouped summaries —
groupby(...).agg(...)with named aggregations; computenunique,mean,sum, and derived columns after aggregation. -
Top-N and tie-breaking — prefer
sort_values([metric, tie_col], ascending=[False, True]).head(n); state deterministic tie logic explicitly. -
Nested data flattening — use
explodefor lists,pd.json_normalizefor dicts, andapply(lambda x: ...)only when vectorization is awkward. -
Time-window metrics — convert with
pd.to_datetime, use.dt.date,Timedelta, and self-joins or shifted dates for next-day retention.
Common pitfalls
Pitfall: Computing averages at the wrong grain, such as averaging user-level averages instead of event-level time unless the prompt explicitly asks for equal user weighting.
Pitfall: Forgetting to deduplicate shopping or pin events before
count, which silently turns repeated logs into fake engagement.
Pitfall: Using row-wise
applyeverywhere; it may pass small examples but signals weak pandas fluency whengroupby,merge,explode, or vectorized masks are cleaner.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Featured in interview prep guides
Practice questions
- Find top category by video time spentPinterest · Data Scientist · Technical Screen · Medium
- Transform nested dicts with pandas apply/lambdaPinterest · Data Scientist · Onsite · Medium
- Write SQL and pandas for shopping eventsPinterest · Data Scientist · Technical Screen · Medium
- Find top video category by average timePinterest · Data Scientist · Technical Screen · Medium
- Aggregate video time and unique pins in PythonPinterest · Data Scientist · Technical Screen · Medium
- Clean and Aggregate Transactions for Finance DashboardPinterest · Data Scientist · Onsite · Medium
Related concepts
- Pandas Data ManipulationData Manipulation (SQL/Python)
- Python/Pandas Data ManipulationData Manipulation (SQL/Python)
- Python Data Manipulation And Core CodingCoding & Algorithms
- Python, Pandas, NumPy, And R Data ManipulationData Manipulation (SQL/Python)
- SQL/Python Data Manipulation And JoinsData Manipulation (SQL/Python)
- SQL And Python Data ManipulationData Manipulation (SQL/Python)