Python Data Manipulation And Core Coding
Asked of: Data Scientist
Last updated

What's being tested
Interviewers are probing Python data manipulation fluency for DS workflows: cleaning transaction-like records, computing statistics, joining/ranking data, and transforming text or sequences. You need to show correct logic, edge-case handling, and clear complexity reasoning using plain Python, pandas, and SQL-equivalent patterns.
Patterns & templates
-
Variance computation — population: ; sample: ; handle empty and single-value lists explicitly.
-
Numerically stable aggregation — prefer two-pass variance for clarity; mention Welford’s algorithm for streaming
O(n)time,O(1)space. -
Transaction cleaning with
pandas— usedropna,astype,to_datetime,sort_values,groupby,agg,diff; validate duplicate IDs and negative amounts. -
User-level time features —
df.sort_values(["user_id", "timestamp"]), thengroupby("user_id")["timestamp"].diff()for inter-event intervals. -
Join semantics — know
merge(..., how="inner|left|right|outer|cross"); always check row-count changes and duplicate keys after joins. -
Window-function equivalents — SQL
ROW_NUMBER() OVER (PARTITION BY user ORDER BY ts)maps tosort_valuesplusgroupby().cumcount()in Python. -
Sequence transformations — generate bigrams with
zip(tokens, tokens[1:])or list comprehension; timeO(n), spaceO(n)unless using a generator.
Common pitfalls
Pitfall: Confusing sample variance and population variance; always ask whether the list is the full population or an observed sample.
Pitfall: Treating joins as harmless; many-to-many joins can silently inflate transaction counts, revenue, fraud labels, or conversion metrics.
Pitfall: Writing clever one-liners without explaining nulls, ties, type casting, sorting assumptions, or empty-input behavior.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Featured in interview prep guides
Practice questions
- Compute variance of a list in PythonPayPal · Data Scientist · Technical Screen · easy
- Explain differences between Python list and tuplePayPal · Data Scientist · Technical Screen · hard
- Compute Variance from a Python ListPayPal · Data Scientist · Technical Screen · hard
- Implement sliding-window device anomalyPayPal · Data Scientist · Technical Screen · Medium
- Generate Bigrams Using Python List Comprehension and ZipPayPal · Data Scientist · Technical Screen · Medium
- Clean and Summarize User Purchase Data EfficientlyPayPal · Data Scientist · Onsite · Medium
- Count Word Frequency and Print Top Three WordsPayPal · Data Scientist · Onsite · Medium
- Explain Window Functions and Joins in SQL and PythonPayPal · Data Scientist · Onsite · Medium
- Clean and Analyze User Transactions with Python FunctionsPayPal · Data Scientist · Onsite · Medium
Related concepts
- Python/Pandas Data ManipulationData Manipulation (SQL/Python)
- Python, Pandas, NumPy, And R Data ManipulationData Manipulation (SQL/Python)
- Pandas Data ManipulationData Manipulation (SQL/Python)
- SQL And Python Data ManipulationData Manipulation (SQL/Python)
- Pandas Data WranglingData Manipulation (SQL/Python)
- SQL/Python Data Manipulation And JoinsData Manipulation (SQL/Python)