PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates skills in parsing nested event data, extracting and deduplicating PII fields, and performing key-based correlation to detect related fraud events.

  • medium
  • Affirm
  • Coding & Algorithms
  • Software Engineer

Detect fraud events and extract PII

Company: Affirm

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

You are given a list of event objects (dictionaries/JSON). Each event has: - `event_type`: either `"underwriting"` or `"fraud_flag"` - `customer_details`: a nested object that may contain PII fields such as `address`, `email`, `phone`, `ssn` (and may contain non-PII fields like `credit_score`) - other event-specific fields (e.g., `loan_amount`) Example input: ```json [ { "customer_details": { "address": "8941 Curry St", "credit_score": 505, "email": "cash.olson@yahoo.com", "phone": "309-144-1261", "ssn": "329340719" }, "event_type": "underwriting", "loan_amount": 256 }, { "customer_details": { "address": "2143 Scarborough Ave", "credit_score": 774, "email": "hinson.parrott@hotmail.com", "phone": "117-570-8961", "ssn": "634841077" }, "event_type": "fraud_flag" } ] ``` Implement a function that processes the events and returns: 1. **A set of PII values** seen across all events. Treat these fields as PII: `address`, `email`, `phone`, `ssn`. (Ignore non-PII fields like `credit_score`.) 2. For each `"underwriting"` event, determine whether it should be labeled as **fraudulent**. Use this rule: - An underwriting event is fraud if there exists at least one `"fraud_flag"` event for the **same customer**, where customer identity is determined by matching `customer_details.ssn`. Additional requirements/constraints: - Events are given as a list (can be assumed to fit in memory). - `n` can be large (e.g., up to 100k events), so aim for an efficient solution. - Some events may be missing some PII keys; only add present PII values to the set. Clearly specify your output format (e.g., list of booleans aligned to underwriting events, or a list of underwriting events augmented with `is_fraud`).

Quick Answer: This question evaluates skills in parsing nested event data, extracting and deduplicating PII fields, and performing key-based correlation to detect related fraud events.

Part 1: Extract All Unique PII Values

Given a list of event dictionaries, collect every unique PII value that appears inside event['customer_details']. Treat only these keys as PII: 'address', 'email', 'phone', and 'ssn'. Ignore non-PII keys such as 'credit_score'. If a PII key is missing or its value is None, do not add anything for that key.\n\nReturn a Python set containing all unique PII values seen across every event.

Constraints

  • 0 <= len(events) <= 100000
  • Each event can be processed independently.
  • Only these keys count as PII: 'address', 'email', 'phone', 'ssn'.
  • Some events may be missing 'customer_details' or some PII keys.
  • If a PII key exists but its value is None, ignore it.

Examples

Input: ([{'customer_details': {'email': 'cash.olson@yahoo.com', 'phone': '309-144-1261', 'address': '2143 Scarborough Ave', 'ssn': '329340719', 'credit_score': 720}}, {'customer_details': {'email': 'hinson.parrott@hotmail.com', 'phone': '117-570-8961', 'address': '8941 Curry St', 'ssn': '634841077', 'credit_score': 680}}, {'customer_details': {'email': 'cash.olson@yahoo.com', 'phone': None, 'address': None, 'ssn': None, 'credit_score': 710}}, {'customer_details': {'credit_score': 800}}],)

Expected Output: {'cash.olson@yahoo.com', '309-144-1261', '117-570-8961', '2143 Scarborough Ave', '329340719', 'hinson.parrott@hotmail.com', '634841077', '8941 Curry St'}

Explanation: Only the four PII keys are collected. Duplicate email values are stored once, and None values are ignored.

Input: ([],)

Expected Output: set()

Explanation: An empty event list contains no PII values.

Input: ([{'customer_details': {'email': 'a@example.com', 'credit_score': 500}}, {'customer_details': {'email': 'a@example.com', 'phone': '555-0000', 'address': None}}, {'customer_details': {'ssn': '111-22-3333', 'phone': '555-0000'}}, {'customer_details': None}, {}],)

Expected Output: {'a@example.com', '555-0000', '111-22-3333'}

Explanation: Duplicate values appear once, None is ignored, and events without usable customer_details are skipped.

Input: ([{'customer_details': {'credit_score': 810, 'address': None, 'email': None, 'phone': None, 'ssn': None}}, {'other_key': 1}],)

Expected Output: set()

Explanation: There are no non-None values for the allowed PII keys.

Hints

  1. The list of PII keys is fixed and very small, so you only need to check those keys for each event.
  2. Use a set to avoid adding duplicate PII values more than once.

Part 2: Label Underwriting Events as Fraudulent by SSN

You are given a list of event dictionaries. Each event has an 'event_type' and may contain a nested 'customer_details' dictionary.\n\nFor every event whose 'event_type' is 'underwriting', determine whether it should be labeled as fraudulent. An underwriting event is fraudulent if there exists at least one event whose 'event_type' is 'fraud_flag' and whose customer_details['ssn'] matches the underwriting event's customer_details['ssn'].\n\nReturn a list of booleans aligned to underwriting events in their original input order. If an underwriting event has no SSN, it cannot match and should be labeled False. A fraud_flag event without an SSN should be ignored.

Constraints

  • 0 <= len(events) <= 100000
  • Aim for an O(n) or O(n log n) solution.
  • Customer identity is determined only by matching 'customer_details["ssn"]'.
  • Some events may be missing 'customer_details' or 'ssn'.
  • A matching fraud_flag may appear either before or after an underwriting event.

Examples

Input: ([{'event_type': 'underwriting', 'customer_details': {'ssn': '111-11-1111'}}, {'event_type': 'other', 'customer_details': {'ssn': '999-99-9999'}}, {'event_type': 'fraud_flag', 'customer_details': {'ssn': '111-11-1111'}}, {'event_type': 'underwriting', 'customer_details': {'ssn': '222-22-2222'}}, {'event_type': 'fraud_flag', 'customer_details': {'ssn': '333-33-3333'}}, {'event_type': 'underwriting', 'customer_details': {'ssn': '333-33-3333'}}],)

Expected Output: [True, False, True]

Explanation: The first and third underwriting SSNs appear in fraud_flag events somewhere in the list. The second does not.

Input: ([{'event_type': 'fraud_flag', 'customer_details': {'ssn': '555-55-5555'}}, {'event_type': 'underwriting', 'customer_details': {'ssn': '555-55-5555'}}],)

Expected Output: [True]

Explanation: The underwriting event has the same SSN as a fraud_flag event.

Input: ([{'event_type': 'fraud_flag', 'customer_details': {}}, {'event_type': 'underwriting', 'customer_details': {}}, {'event_type': 'underwriting'}, {'event_type': 'fraud_flag', 'customer_details': {'ssn': '999-99-9999'}}, {'event_type': 'underwriting', 'customer_details': {'ssn': '999-99-9999'}}],)

Expected Output: [False, False, True]

Explanation: Missing SSNs cannot match. The final underwriting event matches the valid fraud_flag SSN.

Input: ([],)

Expected Output: []

Explanation: There are no events, so there are no underwriting labels to return.

Input: ([{'event_type': 'fraud_flag', 'customer_details': {'ssn': '777-77-7777'}}, {'event_type': 'underwriting', 'customer_details': {'ssn': '777-77-7777'}}, {'event_type': 'other', 'customer_details': {'ssn': '777-77-7777'}}, {'event_type': 'underwriting', 'customer_details': {'ssn': '777-77-7777'}}],)

Expected Output: [True, True]

Explanation: Both underwriting events share an SSN that appears in a fraud_flag event.

Hints

  1. Because a fraud_flag can appear anywhere in the list, think globally about which SSNs have been flagged.
  2. A common approach is: first collect flagged SSNs, then scan again to label underwriting events in order.
Last updated: May 21, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Determine Redeemable Promotion Offers - Affirm (medium)
  • Compute Available Offers per User - Affirm (easy)
  • Aggregate loans and match repayments - Affirm (medium)
  • Implement a timestamped map - Affirm (medium)
  • Compute Balances and Minimize Settlements - Affirm (hard)