Add a conditional column in Python
Company: Google
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Using pandas, add a derived column to a table based on multiple conditions with strict precedence and missing-value handling. Given the sample DataFrame below, create a new column 'risk_tier' with the rules: if returns >= 2 OR last_review_rating <= 2.0 then 'high'; else if amount >= 200 AND country in {'US','CA'} then 'medium'; else if signup_date is within the last 30 days relative to 2025-09-01 then 'new'; else 'low'. Requirements: vectorized solution (no Python loops), correct dtype handling for dates and floats, NaNs in last_review_rating should not trigger 'high' unless returns >= 2, and write one simple unit test.
Sample DataFrame df:
+----------+---------+--------+---------+-------------+-------------------+---------+
| order_id | user_id | amount | country | signup_date | last_review_rating| returns |
+----------+---------+--------+---------+-------------+-------------------+---------+
| 1 | 101 | 120.0 | 'US' | '2025-08-15'| 4.5 | 0 |
| 2 | 102 | 350.0 | 'CA' | '2025-06-01'| 2.0 | 1 |
| 3 | 103 | 50.0 | 'FR' | '2025-08-25'| null | 2 |
| 4 | 104 | 500.0 | 'US' | '2025-09-01'| 5.0 | 0 |
+----------+---------+--------+---------+-------------+-------------------+---------+
Quick Answer: This question evaluates a candidate's ability to perform vectorized conditional feature engineering in pandas, including dtype-aware date and float handling, strict precedence in multi-condition logic, correct NaN semantics for numeric fields, and the ability to express and verify the behavior with a simple unit test.