This question evaluates skills in parsing nested event data, extracting and deduplicating PII fields, and performing key-based correlation to detect related fraud events.
You are given a list of event objects (dictionaries/JSON). Each event has:
event_type
: either
"underwriting"
or
"fraud_flag"
customer_details
: a nested object that may contain PII fields such as
address
,
email
,
phone
,
ssn
(and may contain non-PII fields like
credit_score
)
loan_amount
)
Example input:
[
{
"customer_details": {
"address": "8941 Curry St",
"credit_score": 505,
"email": "cash.olson@yahoo.com",
"phone": "309-144-1261",
"ssn": "329340719"
},
"event_type": "underwriting",
"loan_amount": 256
},
{
"customer_details": {
"address": "2143 Scarborough Ave",
"credit_score": 774,
"email": "hinson.parrott@hotmail.com",
"phone": "117-570-8961",
"ssn": "634841077"
},
"event_type": "fraud_flag"
}
]
Implement a function that processes the events and returns:
address
,
email
,
phone
,
ssn
. (Ignore non-PII fields like
credit_score
.)
"underwriting"
event, determine whether it should be labeled as
fraudulent
. Use this rule:
"fraud_flag"
event for the
same customer
, where customer identity is determined by matching
customer_details.ssn
.
Additional requirements/constraints:
n
can be large (e.g., up to 100k events), so aim for an efficient solution.
Clearly specify your output format (e.g., list of booleans aligned to underwriting events, or a list of underwriting events augmented with is_fraud).