Detect fraud events and extract PII
Company: Affirm
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates skills in parsing nested event data, extracting and deduplicating PII fields, and performing key-based correlation to detect related fraud events.
Part 1: Extract All Unique PII Values
Constraints
- 0 <= len(events) <= 100000
- Each event can be processed independently.
- Only these keys count as PII: 'address', 'email', 'phone', 'ssn'.
- Some events may be missing 'customer_details' or some PII keys.
- If a PII key exists but its value is None, ignore it.
Examples
Input: ([{'customer_details': {'email': 'cash.olson@yahoo.com', 'phone': '309-144-1261', 'address': '2143 Scarborough Ave', 'ssn': '329340719', 'credit_score': 720}}, {'customer_details': {'email': 'hinson.parrott@hotmail.com', 'phone': '117-570-8961', 'address': '8941 Curry St', 'ssn': '634841077', 'credit_score': 680}}, {'customer_details': {'email': 'cash.olson@yahoo.com', 'phone': None, 'address': None, 'ssn': None, 'credit_score': 710}}, {'customer_details': {'credit_score': 800}}],)
Expected Output: {'cash.olson@yahoo.com', '309-144-1261', '117-570-8961', '2143 Scarborough Ave', '329340719', 'hinson.parrott@hotmail.com', '634841077', '8941 Curry St'}
Explanation: Only the four PII keys are collected. Duplicate email values are stored once, and None values are ignored.
Input: ([],)
Expected Output: set()
Explanation: An empty event list contains no PII values.
Input: ([{'customer_details': {'email': 'a@example.com', 'credit_score': 500}}, {'customer_details': {'email': 'a@example.com', 'phone': '555-0000', 'address': None}}, {'customer_details': {'ssn': '111-22-3333', 'phone': '555-0000'}}, {'customer_details': None}, {}],)
Expected Output: {'a@example.com', '555-0000', '111-22-3333'}
Explanation: Duplicate values appear once, None is ignored, and events without usable customer_details are skipped.
Input: ([{'customer_details': {'credit_score': 810, 'address': None, 'email': None, 'phone': None, 'ssn': None}}, {'other_key': 1}],)
Expected Output: set()
Explanation: There are no non-None values for the allowed PII keys.
Hints
- The list of PII keys is fixed and very small, so you only need to check those keys for each event.
- Use a set to avoid adding duplicate PII values more than once.
Part 2: Label Underwriting Events as Fraudulent by SSN
Constraints
- 0 <= len(events) <= 100000
- Aim for an O(n) or O(n log n) solution.
- Customer identity is determined only by matching 'customer_details["ssn"]'.
- Some events may be missing 'customer_details' or 'ssn'.
- A matching fraud_flag may appear either before or after an underwriting event.
Examples
Input: ([{'event_type': 'underwriting', 'customer_details': {'ssn': '111-11-1111'}}, {'event_type': 'other', 'customer_details': {'ssn': '999-99-9999'}}, {'event_type': 'fraud_flag', 'customer_details': {'ssn': '111-11-1111'}}, {'event_type': 'underwriting', 'customer_details': {'ssn': '222-22-2222'}}, {'event_type': 'fraud_flag', 'customer_details': {'ssn': '333-33-3333'}}, {'event_type': 'underwriting', 'customer_details': {'ssn': '333-33-3333'}}],)
Expected Output: [True, False, True]
Explanation: The first and third underwriting SSNs appear in fraud_flag events somewhere in the list. The second does not.
Input: ([{'event_type': 'fraud_flag', 'customer_details': {'ssn': '555-55-5555'}}, {'event_type': 'underwriting', 'customer_details': {'ssn': '555-55-5555'}}],)
Expected Output: [True]
Explanation: The underwriting event has the same SSN as a fraud_flag event.
Input: ([{'event_type': 'fraud_flag', 'customer_details': {}}, {'event_type': 'underwriting', 'customer_details': {}}, {'event_type': 'underwriting'}, {'event_type': 'fraud_flag', 'customer_details': {'ssn': '999-99-9999'}}, {'event_type': 'underwriting', 'customer_details': {'ssn': '999-99-9999'}}],)
Expected Output: [False, False, True]
Explanation: Missing SSNs cannot match. The final underwriting event matches the valid fraud_flag SSN.
Input: ([],)
Expected Output: []
Explanation: There are no events, so there are no underwriting labels to return.
Input: ([{'event_type': 'fraud_flag', 'customer_details': {'ssn': '777-77-7777'}}, {'event_type': 'underwriting', 'customer_details': {'ssn': '777-77-7777'}}, {'event_type': 'other', 'customer_details': {'ssn': '777-77-7777'}}, {'event_type': 'underwriting', 'customer_details': {'ssn': '777-77-7777'}}],)
Expected Output: [True, True]
Explanation: Both underwriting events share an SSN that appears in a fraud_flag event.
Hints
- Because a fraud_flag can appear anywhere in the list, think globally about which SSNs have been flagged.
- A common approach is: first collect flagged SSNs, then scan again to label underwriting events in order.