How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Technical Screen rounds at Nextdoor.

What role is this question designed for?

This question is commonly asked for Data Engineer candidates at Nextdoor during technical interviews.

Design KPI tables, retention, rollups, timezone

Q: Design KPI tables, retention, rollups, timezone

This question evaluates data engineering and analytical-system design competencies, including derived table schema design, ETL and incremental load strategies, time-series rollups, cohort/retention analysis, aggregation correctness, and timezone-aware event handling, and it sits in the System Design / Data Engineering domain.

You are building a fast KPI dashboard for Nextagram. The raw tables are:

users (user_id PK, username, created_at TIMESTAMP)
photos (photo_id PK, user_id, created_at TIMESTAMP)
follows (follower_user_id, followed_user_id, created_at TIMESTAMP)

Assume raw timestamps are stored in PST initially.

The dashboard is reported daily (one row per report date). Define the daily KPIs as:

new_users : users created on that report date
new_users_change_vs_7d_ago : new_users(today) − new_users(today − 7 days)
DAU : distinct users who performed at least one “active” action that day (define “active” as: posted a photo OR followed someone)
DAU_change_vs_7d_ago
new_photos : photos created on that report date
new_photos_per_DAU : new_photos / DAU
new_photos_change_vs_7d_ago
followers : total number of follow relationships as-of end of that report date (i.e., cumulative count of rows in follows with created_at <= end_of_day)
%_users_with_0_followers : percent of users whose follower count (inbound follows) is 0 as-of end of that date
%_users_with_1_2_followers
%_users_with_3plus_followers

Tasks:

A) Data model / table design

Design the derived table(s) that power this dashboard for a huge dataset (fast reads).
Specify table schemas (columns, grain, and keys), and what is precomputed.
List key considerations (partitioning/clustering, incremental loads, idempotency, late-arriving data, correctness vs cost, etc.).

B) Daily population strategy

Describe how you would populate/refresh these tables every day from the raw tables (high-level ETL steps; SQL/pseudocode is acceptable).

C) Add 1-week retention

Add a 1-week retention metric to the dashboard: for each cohort day D (users who joined on D), what fraction are active on D+7 (using the same “active” definition as DAU).
Describe required table changes and how you would compute it efficiently.

D) Switch to weekly rollups

The dashboard should support weekly rollups instead of daily. Describe what changes in the table design and computation (date grain, definitions of “change vs 7d ago”, handling partial weeks, etc.).

E) Report in a different timezone

A New York office wants metrics reported in EST instead of PST. Describe what must change (timestamp storage, date boundaries, ETL windows, backfills, and how to avoid double-counting at day boundaries).

You are building a fast KPI dashboard for Nextagram. The raw tables are:

users (user_id PK, username, created_at TIMESTAMP)
photos (photo_id PK, user_id, created_at TIMESTAMP)
follows (follower_user_id, followed_user_id, created_at TIMESTAMP)

Assume raw timestamps are stored in PST initially.

The dashboard is reported daily (one row per report date). Define the daily KPIs as:

new_users : users created on that report date
new_users_change_vs_7d_ago : new_users(today) − new_users(today − 7 days)
DAU : distinct users who performed at least one “active” action that day (define “active” as: posted a photo OR followed someone)
DAU_change_vs_7d_ago
new_photos : photos created on that report date
new_photos_per_DAU : new_photos / DAU
new_photos_change_vs_7d_ago
followers : total number of follow relationships as-of end of that report date (i.e., cumulative count of rows in follows with created_at <= end_of_day)
%_users_with_0_followers : percent of users whose follower count (inbound follows) is 0 as-of end of that date
%_users_with_1_2_followers
%_users_with_3plus_followers

Tasks:

A) Data model / table design

Design the derived table(s) that power this dashboard for a huge dataset (fast reads).
Specify table schemas (columns, grain, and keys), and what is precomputed.
List key considerations (partitioning/clustering, incremental loads, idempotency, late-arriving data, correctness vs cost, etc.).

B) Daily population strategy

Describe how you would populate/refresh these tables every day from the raw tables (high-level ETL steps; SQL/pseudocode is acceptable).

C) Add 1-week retention

Add a 1-week retention metric to the dashboard: for each cohort day D (users who joined on D), what fraction are active on D+7 (using the same “active” definition as DAU).
Describe required table changes and how you would compute it efficiently.

D) Switch to weekly rollups

The dashboard should support weekly rollups instead of daily. Describe what changes in the table design and computation (date grain, definitions of “change vs 7d ago”, handling partial weeks, etc.).

E) Report in a different timezone

A New York office wants metrics reported in EST instead of PST. Describe what must change (timestamp storage, date boundaries, ETL windows, backfills, and how to avoid double-counting at day boundaries).

Design KPI tables, retention, rollups, timezone

Quick Overview

Solution

Comments (0)

Design KPI tables, retention, rollups, timezone

Quick Overview

Solution

Comments (0)