Map sources to functional dataset with SQL
Company: EY
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
You must produce a functional, consumption‑ready dataset for daily exposure monitoring. Assume “today” = 2025‑09‑01. Use the last 7 calendar days by trade_dt (inclusive), handle late‑arriving data, and deduplicate to the latest ingested record per trade_id.
Sample source tables (ASCII):
Customers
+---------+-----------+------------+---------+
| cust_id | cust_name | kyc_status | country |
+---------+-----------+------------+---------+
| 101 | Alpha LLC | PASS | US |
| 102 | Beta SA | PASS | FR |
| 103 | Gamma AG | REVIEW | DE |
+---------+-----------+------------+---------+
Accounts
+---------+---------+---------+---------------------+
| acct_id | cust_id | product | opened_at |
+---------+---------+---------+---------------------+
| 5001 | 101 | MARGIN | 2023-05-10 09:00:00 |
| 5002 | 102 | CASH | 2024-11-01 10:00:00 |
+---------+---------+---------+---------------------+
Trades
+----------+---------+------------+-------------+---------------+------+----------+---------------------+
| trade_id | acct_id | trade_dt | asset_class | notional_usd | side | status | ingested_at |
+----------+---------+------------+-------------+---------------+------+----------+---------------------+
| T1 | 5001 | 2025-08-26 | EQ | 1,000,000 | BUY | BOOKED | 2025-08-26 12:00:00 |
| T1 | 5001 | 2025-08-26 | EQ | 1,000,000 | BUY | CANCELED | 2025-08-27 08:00:00 |
| T2 | 5001 | 2025-08-28 | FI | 2,500,000 | SELL | BOOKED | 2025-08-28 11:30:00 |
| T3 | 5002 | 2025-08-30 | EQ | 750,000 | BUY | BOOKED | 2025-09-01 02:00:00 |
+----------+---------+------------+-------------+---------------+------+----------+---------------------+
RiskLimits
+---------+----------------+
| acct_id | daily_limit_usd|
+---------+----------------+
| 5001 | 2,000,000 |
| 5002 | 1,000,000 |
+---------+----------------+
Task: Write SQL to build a DailyExposure fact at grain (acct_id, trade_dt) over [2025‑08‑25, 2025‑09‑01]. Requirements:
- Deduplicate to the latest ingested row per (trade_id) before aggregation.
- Compute gross_notional (sum abs(notional_usd)), net_notional (BUY positive, SELL negative), limit_utilization = gross_notional / daily_limit_usd, and breach_flag (limit_utilization > 1.0).
- Exclude trades with status = 'CANCELED'.
- Include only accounts with Customers.kyc_status = 'PASS'.
- Make the query idempotent for daily backfills (no double counting on re‑runs).
Provide the final SELECT and explain one edge case your SQL intentionally ignores.
Quick Answer: This question evaluates competency in SQL-based data manipulation and data engineering concepts, including handling late-arriving data and deduplication to the latest ingested record, joining and filtering for enrichment and KYC status, and computing exposure metrics such as gross/net notional, limit utilization, and breach flags.