PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates a candidate's competency in data warehousing and engineering for analytics, including event-level schema and partitioning, slowly changing dimensions, idempotent ingestion and deduplication, derived metrics modeling, and query performance and cost trade-offs using SQL/Python.

  • Medium
  • Snowflake
  • Data Manipulation (SQL/Python)
  • Data Scientist

Design an analytic warehouse for event data

Company: Snowflake

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Onsite

Design a warehouse-ready analytics data model and ingestion plan to support cohort retention, ARPU, and product-case analyses at scale (50M events/day). Assume BigQuery or Snowflake. Sub-questions: 1) Schema: Propose fact and dimension tables (e.g., fact_events, fact_orders, dim_user, dim_country). Provide DDL-level details: partitioning (by event_date), clustering/sorting keys (user_id, event_type), and surrogate keys. Explain how you’d model slowly changing dimensions (Type 2) for user country and app version. 2) Idempotency & deduplication: Events can arrive late (up to 14 days) and out-of-order with occasional duplicates (same (user_id, event_ts, event_type)). Specify your dedupe key and merge/upsert strategy. How do you reprocess late data without double-counting cohorts? Include a backfill plan. 3) Metrics tables: Define a derived table grain for weekly retention by signup_week and ARPU by cohort. Show the SQL pattern (window PARTITION BY and date bucketing) you’d use and how materialization/incremental build works. Describe data quality checks (e.g., cohort_size non-increasing across week_index, retention ∈ [0,1]). 4) Performance: Estimate table sizes, choose file sizes/micro-partitions, and justify cluster keys for typical queries (top-N countries last 8 weeks, rolling DAU/WAU/MAU). Discuss cost controls (partition pruning, approximate distinct with HLL, result caching).

Quick Answer: This question evaluates a candidate's competency in data warehousing and engineering for analytics, including event-level schema and partitioning, slowly changing dimensions, idempotent ingestion and deduplication, derived metrics modeling, and query performance and cost trade-offs using SQL/Python.

Last updated: Mar 29, 2026

Related Coding Questions

  • Build a cohort dashboard with Streamlit and SQL - Snowflake (Medium)
  • Query seven-day conversion with windows and dedupe - Snowflake (Medium)

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.