Profile and visualize an unfamiliar dataset
Company: Amazon
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: medium
Interview Round: Technical Screen
You receive an undocumented CSV combining user events and purchases with columns: order_id, user_id, event_ts (UTC), merchant_id, session_id, event_type (view/add_to_cart/purchase), amount_usd, device_type, country. In 30 minutes, outline exactly how you would understand the dataset and create executive-ready visualizations.
Requirements:
- Data understanding: list the first 10 checks you run (e.g., nulls, duplicates, timestamp monotonicity by session, timezone sanity, categorical cardinality, outliers, unit consistency, referential integrity between event_type='purchase' and amount_usd, weekend/weekday patterns, country/device coverage).
- Visual plan: propose 3–5 specific charts (titles, axes, grain) to answer “What is happening?” and “So what?”. Justify each choice and expected insight.
- Granularity: choose daily vs hourly aggregation for a launch week; defend trade-offs and how you’d switch with a parameter.
- Data quality: show how one bad clock-skew day would appear in your visuals and how you’d annotate/adjust.
- Deliverable: describe a one-slide dashboard wireframe (sections, KPIs, filters) and how you’d validate it with a stakeholder in a 5‑minute readout.
Quick Answer: This question evaluates a data scientist's competency in exploratory data analysis, data quality assessment, time-series aggregation, visualization design, and concise stakeholder communication when working with an undocumented events-plus-purchases CSV.