How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a hard difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Nextdoor.

What role is this question designed for?

This question is commonly asked for Data Engineer candidates at Nextdoor during technical interviews.

Write SQL for app metrics | Nextdoor Coding Question

Quick Overview

This question evaluates proficiency in SQL-based data engineering, focusing on time-zone-aware timestamp handling, aggregation of event and relationship data, cohort retention analysis, and computation of product metrics like DAU, new users, photo activity, and follower distributions.

Write SQL for app metrics

Company: Nextdoor

Role: Data Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

You are given a simplified schema for a photo-sharing app: - `users(user_id, user_name, joined_at)` - `photos(photo_id, user_id, created_at)` - `follows(follower_user_id, followed_user_id, created_at)` - `user_events(event_id, user_id, event_type, occurred_at)` Assume all timestamps are stored in UTC. Unless otherwise stated, reporting should use the `America/Los_Angeles` timezone. A row in `follows` represents a follow created at `created_at` and still active afterward. `DAU` means the number of distinct users with at least one row in `user_events` on that local calendar day. Also assume the dashboard is backed by these aggregate tables: - `kpi_daily_base(report_date, new_users, total_users_eod, dau, new_photos, total_follow_edges_eod)` - `follower_bucket_daily(report_date, bucket, user_count)` where `bucket` is one of `0`, `1_2`, or `3_plus` - `retention_cohort_daily(cohort_date, cohort_size, retained_w1_users)` Write SQL for the following tasks: 1. Count the total number of users. 2. Find the calendar month with the highest number of new user signups. 3. Find the user with the largest follower count. 4. Populate the daily aggregate tables from the raw tables so the dashboard can report: - new users - change versus 7 days earlier - DAU - change versus 7 days earlier - new photos - new photos per DAU - change versus 7 days earlier - total follower relationships - percentage of users with 0 followers - percentage of users with 1-2 followers - percentage of users with 3+ followers 5. Extend the SQL to compute week-1 retention, defined as: among users who signed up on day `D`, the share who had at least one event on days `D+7` through `D+13` in the reporting timezone.

Quick Answer: This question evaluates proficiency in SQL-based data engineering, focusing on time-zone-aware timestamp handling, aggregation of event and relationship data, cohort retention analysis, and computation of product metrics like DAU, new users, photo activity, and follower distributions.

Part 1: Count the Total Number of Users

You are given a list of imported user rows from a photo-sharing app. Each row has the form (user_id, user_name, joined_at_utc). Because imports may be replayed, the same logical user can appear more than once. Count how many distinct users exist, where user_id uniquely identifies a user.

Constraints

0 <= len(users) <= 200000
Each row has exactly 3 fields
user_id is a non-empty string
joined_at_utc is a valid ISO 8601 UTC timestamp ending in 'Z'

Examples

Input: ([('u1', 'Ana', '2024-01-01T00:00:00Z'), ('u2', 'Bob', '2024-01-02T00:00:00Z'), ('u3', 'Cam', '2024-01-03T00:00:00Z')],)

Expected Output: 3

Explanation: There are 3 different user IDs.

Input: ([('u1', 'Ana', '2024-01-01T00:00:00Z'), ('u1', 'Ana', '2024-01-01T00:00:00Z'), ('u2', 'Bob', '2024-01-02T00:00:00Z')],)

Expected Output: 2

Explanation: The duplicate row for u1 should only be counted once.

Input: ([],)

Expected Output: 0

Explanation: No rows means no users.

Input: ([('solo', 'OnlyUser', '2024-05-10T12:00:00Z')],)

Expected Output: 1

Explanation: A single row represents one distinct user.

Hints

You only need the first field of each row.
A set is useful when you need to count unique identifiers.

Part 2: Calendar Month with the Most New User Signups

You are given user rows of the form (user_id, user_name, joined_at_utc). A signup belongs to the calendar month of joined_at after converting the UTC timestamp to the America/Los_Angeles timezone. Return the month with the highest number of signups in 'YYYY-MM' format. If multiple months tie, return the earliest such month. If there are no users, return an empty string.

Constraints

0 <= len(users) <= 200000
Each timestamp is a valid ISO 8601 UTC timestamp ending in 'Z'
Use the America/Los_Angeles timezone before extracting the calendar month

Examples

Input: ([('u1', 'Ana', '2024-01-01T10:00:00Z'), ('u2', 'Bob', '2024-01-15T12:00:00Z'), ('u3', 'Cam', '2024-02-01T09:00:00Z'), ('u4', 'Dan', '2024-02-20T18:00:00Z'), ('u5', 'Eli', '2024-02-28T23:00:00Z')],)

Expected Output: '2024-02'

Explanation: January has 2 signups and February has 3.

Input: ([('u1', 'A', '2024-01-01T07:30:00Z'), ('u2', 'B', '2024-01-31T23:00:00Z'), ('u3', 'C', '2024-02-01T07:30:00Z'), ('u4', 'D', '2024-02-15T12:00:00Z'), ('u5', 'E', '2024-03-01T07:30:00Z')],)

Expected Output: '2024-01'

Explanation: After timezone conversion, counts are 2023-12: 1, 2024-01: 2, 2024-02: 2. The tie is broken by earliest month.

Input: ([],)

Expected Output: ''

Explanation: No users means no winning month.

Input: ([('u1', 'A', '2024-01-01T00:30:00Z')],)

Expected Output: '2023-12'

Explanation: 2024-01-01 00:30 UTC is still 2023-12-31 in Los Angeles.

Hints

Convert each timestamp to local time before truncating to month.
Count signups per month, then apply the tie-break rule by earliest month.

Part 3: Find the User with the Largest Follower Count

You are given a users table and a follows table. Each user row is (user_id, user_name, joined_at_utc). Each follow row is (follower_user_id, followed_user_id, created_at_utc). Return the user_id of the user with the largest number of distinct followers. Only users present in the users table are eligible to win. If multiple users tie, return the lexicographically smallest user_id. If users is empty, return an empty string.

Constraints

0 <= len(users) <= 200000
0 <= len(follows) <= 200000
Count distinct follower_user_id values per followed_user_id
Timestamps are provided but are not needed for this task

Examples

Input: ([('u1', 'A', '2024-01-01T00:00:00Z'), ('u2', 'B', '2024-01-01T00:00:00Z'), ('u3', 'C', '2024-01-01T00:00:00Z')], [('u2', 'u1', '2024-01-02T00:00:00Z'), ('u3', 'u1', '2024-01-02T00:00:00Z'), ('u1', 'u2', '2024-01-03T00:00:00Z')])

Expected Output: 'u1'

Explanation: u1 has 2 distinct followers, more than any other user.

Expected Output: 'u1'

Explanation: u1 and u2 both have 2 followers, so the lexicographically smaller ID wins.

Input: ([('u2', 'B', '2024-01-01T00:00:00Z'), ('u1', 'A', '2024-01-01T00:00:00Z')], [])

Expected Output: 'u1'

Explanation: Both users have 0 followers, so return the smaller user_id.

Input: ([('u1', 'A', '2024-01-01T00:00:00Z'), ('u2', 'B', '2024-01-01T00:00:00Z'), ('u3', 'C', '2024-01-01T00:00:00Z')], [('u2', 'u1', '2024-01-02T00:00:00Z'), ('u2', 'u1', '2024-01-03T00:00:00Z'), ('u3', 'u1', '2024-01-04T00:00:00Z')])

Expected Output: 'u1'

Explanation: Duplicate follows from the same follower count only once because followers are distinct.

Input: ([], [('u1', 'u2', '2024-01-01T00:00:00Z')])

Expected Output: ''

Explanation: No users means no valid winner.

Hints

Build a set of followers for each followed user.
Do not forget users with zero followers; one of them could win if there are no follow rows.

Part 4: Populate Daily Aggregate Tables for Dashboard Metrics

You are given raw app tables as lists of rows. Build two daily aggregate tables over an inclusive reporting date range [start_date, end_date] using the America/Los_Angeles timezone. 1. kpi_daily_base rows must be [report_date, new_users, total_users_eod, dau, new_photos, total_follow_edges_eod] 2. follower_bucket_daily rows must be [report_date, bucket, user_count] for each bucket in the fixed order '0', '1_2', '3_plus' Definitions: - new_users: users whose joined_at falls on that local date - total_users_eod: users whose joined_at is on or before that local date - dau: number of distinct users with at least one event on that local date - new_photos: photos whose created_at falls on that local date - total_follow_edges_eod: follows whose created_at is on or before that local date - follower buckets: among users that exist by end of day, bucket each user by their active follower count using follows created on or before that day Return both aggregate tables in a dictionary with keys 'kpi_daily_base' and 'follower_bucket_daily'.

Constraints

0 <= len(users), len(photos), len(follows), len(user_events) <= 200000
start_date <= end_date
All timestamps are valid ISO 8601 UTC timestamps ending in 'Z'
The number of days in the requested date range is at most 366
Assume user IDs in users are unique and follow rows represent active edges that remain active forever

Examples

Input: ([('u1', 'A', '2024-01-01T10:00:00Z'), ('u2', 'B', '2024-01-02T07:30:00Z'), ('u3', 'C', '2024-01-03T08:30:00Z')], [('p1', 'u1', '2024-01-02T06:00:00Z'), ('p2', 'u2', '2024-01-03T09:00:00Z')], [('u2', 'u1', '2024-01-02T08:00:00Z'), ('u3', 'u1', '2024-01-03T18:00:00Z')], [('e1', 'u1', 'open', '2024-01-02T01:00:00Z'), ('e2', 'u1', 'like', '2024-01-02T23:00:00Z'), ('e3', 'u2', 'open', '2024-01-03T07:20:00Z'), ('e4', 'u2', 'open', '2024-01-03T09:00:00Z')], '2024-01-01', '2024-01-03')

Expected Output: {'kpi_daily_base': [['2024-01-01', 2, 2, 1, 1, 0], ['2024-01-02', 0, 2, 2, 0, 1], ['2024-01-03', 1, 3, 1, 1, 2]], 'follower_bucket_daily': [['2024-01-01', '0', 2], ['2024-01-01', '1_2', 0], ['2024-01-01', '3_plus', 0], ['2024-01-02', '0', 1], ['2024-01-02', '1_2', 1], ['2024-01-02', '3_plus', 0], ['2024-01-03', '0', 2], ['2024-01-03', '1_2', 1], ['2024-01-03', '3_plus', 0]]}

Explanation: This covers timezone conversion, DAU deduplication, cumulative users, cumulative follows, and bucket updates.

Input: ([], [], [], [], '2024-05-01', '2024-05-01')

Expected Output: {'kpi_daily_base': [['2024-05-01', 0, 0, 0, 0, 0]], 'follower_bucket_daily': [['2024-05-01', '0', 0], ['2024-05-01', '1_2', 0], ['2024-05-01', '3_plus', 0]]}

Explanation: The date still appears even when all raw tables are empty.

Input: ([('u1', 'A', '2024-02-01T09:00:00Z'), ('u2', 'B', '2024-02-01T10:00:00Z'), ('u3', 'C', '2024-02-01T11:00:00Z'), ('u4', 'D', '2024-02-01T12:00:00Z')], [], [('u2', 'u1', '2024-02-01T20:00:00Z'), ('u3', 'u1', '2024-02-01T21:00:00Z'), ('u4', 'u1', '2024-02-02T18:00:00Z')], [], '2024-02-01', '2024-02-02')

Expected Output: {'kpi_daily_base': [['2024-02-01', 4, 4, 0, 0, 2], ['2024-02-02', 0, 4, 0, 0, 3]], 'follower_bucket_daily': [['2024-02-01', '0', 3], ['2024-02-01', '1_2', 1], ['2024-02-01', '3_plus', 0], ['2024-02-02', '0', 3], ['2024-02-02', '1_2', 0], ['2024-02-02', '3_plus', 1]]}

Explanation: u1 moves from bucket '1_2' to '3_plus' on the second day.

Input: ([('u1', 'A', '2024-01-01T12:00:00Z'), ('u2', 'B', '2024-01-03T12:00:00Z')], [], [('u2', 'u1', '2024-01-02T12:00:00Z')], [], '2024-01-03', '2024-01-04')

Expected Output: {'kpi_daily_base': [['2024-01-03', 1, 2, 0, 0, 1], ['2024-01-04', 0, 2, 0, 0, 1]], 'follower_bucket_daily': [['2024-01-03', '0', 1], ['2024-01-03', '1_2', 1], ['2024-01-03', '3_plus', 0], ['2024-01-04', '0', 1], ['2024-01-04', '1_2', 1], ['2024-01-04', '3_plus', 0]]}

Explanation: Data before start_date must still affect cumulative end-of-day counts.

Hints

Precompute the local date for every raw row first.
A daily sweep with cumulative totals is simpler than recomputing each day from scratch.

Part 5: Compute Week-1 Retention Cohorts

You are given users and user_events raw tables. For each local signup date D in the America/Los_Angeles timezone, compute a cohort row [cohort_date, cohort_size, retained_w1_users]. A user is retained in week 1 if they had at least one event on local days D+7 through D+13 inclusive. Return all cohort rows sorted by cohort_date ascending.

Constraints

0 <= len(users), len(user_events) <= 200000
All timestamps are valid ISO 8601 UTC timestamps ending in 'Z'
Use the America/Los_Angeles timezone before determining dates
A user should be counted at most once in retained_w1_users for their cohort

Examples

Input: ([('u1', 'A', '2024-01-01T18:00:00Z'), ('u2', 'B', '2024-01-02T07:30:00Z'), ('u3', 'C', '2024-01-03T08:30:00Z')], [('e1', 'u1', 'open', '2024-01-08T20:00:00Z'), ('e2', 'u1', 'like', '2024-01-15T20:00:00Z'), ('e3', 'u2', 'open', '2024-01-14T07:30:00Z'), ('e4', 'u3', 'open', '2024-01-10T10:00:00Z')])

Expected Output: [['2024-01-01', 2, 2], ['2024-01-03', 1, 1]]

Explanation: u1 and u2 are both retained for the 2024-01-01 cohort, and u3 is retained for the 2024-01-03 cohort.

Input: ([('u1', 'A', '2024-01-01T12:00:00Z'), ('u2', 'B', '2024-01-01T13:00:00Z')], [('e1', 'u1', 'open', '2024-01-07T20:00:00Z'), ('e2', 'u2', 'open', '2024-01-15T08:30:00Z')])

Expected Output: [['2024-01-01', 2, 0]]

Explanation: One event is too early and the other is too late, so neither user is retained.

Input: ([('u1', 'A', '2024-02-01T18:00:00Z')], [('e1', 'u1', 'open', '2024-02-08T18:00:00Z'), ('e2', 'u1', 'like', '2024-02-09T18:00:00Z')])

Expected Output: [['2024-02-01', 1, 1]]

Explanation: Multiple qualifying events still count as one retained user.

Input: ([], [('e1', 'u1', 'open', '2024-01-08T18:00:00Z')])

Expected Output: []

Explanation: No users means no cohorts.

Hints

Group users by local signup date first.
For each user, storing the set of local event dates is enough; multiple events on the same day do not change retention.

Quick Overview

Write SQL for app metrics

Company: Nextdoor

Role: Data Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

Part 1: Count the Total Number of Users

Constraints

0 <= len(users) <= 200000
Each row has exactly 3 fields
user_id is a non-empty string
joined_at_utc is a valid ISO 8601 UTC timestamp ending in 'Z'

Examples

Input: ([('u1', 'Ana', '2024-01-01T00:00:00Z'), ('u2', 'Bob', '2024-01-02T00:00:00Z'), ('u3', 'Cam', '2024-01-03T00:00:00Z')],)

Expected Output: 3

Explanation: There are 3 different user IDs.

Input: ([('u1', 'Ana', '2024-01-01T00:00:00Z'), ('u1', 'Ana', '2024-01-01T00:00:00Z'), ('u2', 'Bob', '2024-01-02T00:00:00Z')],)

Expected Output: 2

Explanation: The duplicate row for u1 should only be counted once.

Input: ([],)

Expected Output: 0

Explanation: No rows means no users.

Input: ([('solo', 'OnlyUser', '2024-05-10T12:00:00Z')],)

Expected Output: 1

Explanation: A single row represents one distinct user.

Hints

You only need the first field of each row.
A set is useful when you need to count unique identifiers.

Part 2: Calendar Month with the Most New User Signups

Constraints

0 <= len(users) <= 200000
Each timestamp is a valid ISO 8601 UTC timestamp ending in 'Z'
Use the America/Los_Angeles timezone before extracting the calendar month

Examples

Expected Output: '2024-02'

Explanation: January has 2 signups and February has 3.

Input: ([('u1', 'A', '2024-01-01T07:30:00Z'), ('u2', 'B', '2024-01-31T23:00:00Z'), ('u3', 'C', '2024-02-01T07:30:00Z'), ('u4', 'D', '2024-02-15T12:00:00Z'), ('u5', 'E', '2024-03-01T07:30:00Z')],)

Expected Output: '2024-01'

Explanation: After timezone conversion, counts are 2023-12: 1, 2024-01: 2, 2024-02: 2. The tie is broken by earliest month.

Input: ([],)

Expected Output: ''

Explanation: No users means no winning month.

Input: ([('u1', 'A', '2024-01-01T00:30:00Z')],)

Expected Output: '2023-12'

Explanation: 2024-01-01 00:30 UTC is still 2023-12-31 in Los Angeles.

Hints

Convert each timestamp to local time before truncating to month.
Count signups per month, then apply the tie-break rule by earliest month.

Part 3: Find the User with the Largest Follower Count

Constraints

0 <= len(users) <= 200000
0 <= len(follows) <= 200000
Count distinct follower_user_id values per followed_user_id
Timestamps are provided but are not needed for this task

Examples

Expected Output: 'u1'

Explanation: u1 has 2 distinct followers, more than any other user.

Expected Output: 'u1'

Explanation: u1 and u2 both have 2 followers, so the lexicographically smaller ID wins.

Input: ([('u2', 'B', '2024-01-01T00:00:00Z'), ('u1', 'A', '2024-01-01T00:00:00Z')], [])

Expected Output: 'u1'

Explanation: Both users have 0 followers, so return the smaller user_id.

Input: ([('u1', 'A', '2024-01-01T00:00:00Z'), ('u2', 'B', '2024-01-01T00:00:00Z'), ('u3', 'C', '2024-01-01T00:00:00Z')], [('u2', 'u1', '2024-01-02T00:00:00Z'), ('u2', 'u1', '2024-01-03T00:00:00Z'), ('u3', 'u1', '2024-01-04T00:00:00Z')])

Expected Output: 'u1'

Explanation: Duplicate follows from the same follower count only once because followers are distinct.

Input: ([], [('u1', 'u2', '2024-01-01T00:00:00Z')])

Expected Output: ''

Explanation: No users means no valid winner.

Hints

Build a set of followers for each followed user.
Do not forget users with zero followers; one of them could win if there are no follow rows.

Part 4: Populate Daily Aggregate Tables for Dashboard Metrics

Constraints

0 <= len(users), len(photos), len(follows), len(user_events) <= 200000
start_date <= end_date
All timestamps are valid ISO 8601 UTC timestamps ending in 'Z'
The number of days in the requested date range is at most 366
Assume user IDs in users are unique and follow rows represent active edges that remain active forever

Examples

Explanation: This covers timezone conversion, DAU deduplication, cumulative users, cumulative follows, and bucket updates.

Input: ([], [], [], [], '2024-05-01', '2024-05-01')

Expected Output: {'kpi_daily_base': [['2024-05-01', 0, 0, 0, 0, 0]], 'follower_bucket_daily': [['2024-05-01', '0', 0], ['2024-05-01', '1_2', 0], ['2024-05-01', '3_plus', 0]]}

Explanation: The date still appears even when all raw tables are empty.

Explanation: u1 moves from bucket '1_2' to '3_plus' on the second day.

Input: ([('u1', 'A', '2024-01-01T12:00:00Z'), ('u2', 'B', '2024-01-03T12:00:00Z')], [], [('u2', 'u1', '2024-01-02T12:00:00Z')], [], '2024-01-03', '2024-01-04')

Explanation: Data before start_date must still affect cumulative end-of-day counts.

Hints

Precompute the local date for every raw row first.
A daily sweep with cumulative totals is simpler than recomputing each day from scratch.

Part 5: Compute Week-1 Retention Cohorts

Constraints

0 <= len(users), len(user_events) <= 200000
All timestamps are valid ISO 8601 UTC timestamps ending in 'Z'
Use the America/Los_Angeles timezone before determining dates
A user should be counted at most once in retained_w1_users for their cohort

Examples

Expected Output: [['2024-01-01', 2, 2], ['2024-01-03', 1, 1]]

Explanation: u1 and u2 are both retained for the 2024-01-01 cohort, and u3 is retained for the 2024-01-03 cohort.

Input: ([('u1', 'A', '2024-01-01T12:00:00Z'), ('u2', 'B', '2024-01-01T13:00:00Z')], [('e1', 'u1', 'open', '2024-01-07T20:00:00Z'), ('e2', 'u2', 'open', '2024-01-15T08:30:00Z')])

Expected Output: [['2024-01-01', 2, 0]]

Explanation: One event is too early and the other is too late, so neither user is retained.

Input: ([('u1', 'A', '2024-02-01T18:00:00Z')], [('e1', 'u1', 'open', '2024-02-08T18:00:00Z'), ('e2', 'u1', 'like', '2024-02-09T18:00:00Z')])

Expected Output: [['2024-02-01', 1, 1]]

Explanation: Multiple qualifying events still count as one retained user.

Input: ([], [('e1', 'u1', 'open', '2024-01-08T18:00:00Z')])

Expected Output: []

Explanation: No users means no cohorts.

Hints

Group users by local signup date first.
For each user, storing the set of local event dates is enough; multiple events on the same day do not change retention.