Write SQL and Python for data prep
Company: Meta
Role: Data Engineer
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
Given clickstream events (user_id, event_type, ts, properties) and a users table (user_id, signup_ts, plan), write SQL to compute DAU/WAU/MAU, D1/W1 retention cohorts, and sessionized metrics; implement an incremental daily job that updates aggregates idempotently using window functions and MERGE/INSERT patterns; and diagnose/handle duplicates and late events. Then, using Python (no heavy frameworks), implement a data-cleaning script that parses semi-structured JSON in the properties column, normalizes nested fields, and writes partitioned Parquet outputs with basic unit tests.
Quick Answer: This question evaluates data engineering competencies including SQL analytics with window functions and MERGE/INSERT patterns for idempotent incremental jobs, event-time handling (duplicates and late events), sessionization and retention metrics, plus Python-based JSON parsing, normalization, partitioned Parquet output and basic unit testing.