PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates proficiency in SQL-based data engineering, focusing on time-zone-aware timestamp handling, aggregation of event and relationship data, cohort retention analysis, and computation of product metrics like DAU, new users, photo activity, and follower distributions.

  • hard
  • Nextdoor
  • Coding & Algorithms
  • Data Engineer

Write SQL for app metrics

Company: Nextdoor

Role: Data Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

You are given a simplified schema for a photo-sharing app: - `users(user_id, user_name, joined_at)` - `photos(photo_id, user_id, created_at)` - `follows(follower_user_id, followed_user_id, created_at)` - `user_events(event_id, user_id, event_type, occurred_at)` Assume all timestamps are stored in UTC. Unless otherwise stated, reporting should use the `America/Los_Angeles` timezone. A row in `follows` represents a follow created at `created_at` and still active afterward. `DAU` means the number of distinct users with at least one row in `user_events` on that local calendar day. Also assume the dashboard is backed by these aggregate tables: - `kpi_daily_base(report_date, new_users, total_users_eod, dau, new_photos, total_follow_edges_eod)` - `follower_bucket_daily(report_date, bucket, user_count)` where `bucket` is one of `0`, `1_2`, or `3_plus` - `retention_cohort_daily(cohort_date, cohort_size, retained_w1_users)` Write SQL for the following tasks: 1. Count the total number of users. 2. Find the calendar month with the highest number of new user signups. 3. Find the user with the largest follower count. 4. Populate the daily aggregate tables from the raw tables so the dashboard can report: - new users - change versus 7 days earlier - DAU - change versus 7 days earlier - new photos - new photos per DAU - change versus 7 days earlier - total follower relationships - percentage of users with 0 followers - percentage of users with 1-2 followers - percentage of users with 3+ followers 5. Extend the SQL to compute week-1 retention, defined as: among users who signed up on day `D`, the share who had at least one event on days `D+7` through `D+13` in the reporting timezone.

Quick Answer: This question evaluates proficiency in SQL-based data engineering, focusing on time-zone-aware timestamp handling, aggregation of event and relationship data, cohort retention analysis, and computation of product metrics like DAU, new users, photo activity, and follower distributions.

Part 1: Count the Total Number of Users

You are given a list of imported user rows from a photo-sharing app. Each row has the form (user_id, user_name, joined_at_utc). Because imports may be replayed, the same logical user can appear more than once. Count how many distinct users exist, where user_id uniquely identifies a user.

Constraints

  • 0 <= len(users) <= 200000
  • Each row has exactly 3 fields
  • user_id is a non-empty string
  • joined_at_utc is a valid ISO 8601 UTC timestamp ending in 'Z'

Examples

Input: ([('u1', 'Ana', '2024-01-01T00:00:00Z'), ('u2', 'Bob', '2024-01-02T00:00:00Z'), ('u3', 'Cam', '2024-01-03T00:00:00Z')],)

Expected Output: 3

Explanation: There are 3 different user IDs.

Input: ([('u1', 'Ana', '2024-01-01T00:00:00Z'), ('u1', 'Ana', '2024-01-01T00:00:00Z'), ('u2', 'Bob', '2024-01-02T00:00:00Z')],)

Expected Output: 2

Explanation: The duplicate row for u1 should only be counted once.

Input: ([],)

Expected Output: 0

Explanation: No rows means no users.

Input: ([('solo', 'OnlyUser', '2024-05-10T12:00:00Z')],)

Expected Output: 1

Explanation: A single row represents one distinct user.

Hints

  1. You only need the first field of each row.
  2. A set is useful when you need to count unique identifiers.

Part 2: Calendar Month with the Most New User Signups

You are given user rows of the form (user_id, user_name, joined_at_utc). A signup belongs to the calendar month of joined_at after converting the UTC timestamp to the America/Los_Angeles timezone. Return the month with the highest number of signups in 'YYYY-MM' format. If multiple months tie, return the earliest such month. If there are no users, return an empty string.

Constraints

  • 0 <= len(users) <= 200000
  • Each timestamp is a valid ISO 8601 UTC timestamp ending in 'Z'
  • Use the America/Los_Angeles timezone before extracting the calendar month

Examples

Input: ([('u1', 'Ana', '2024-01-01T10:00:00Z'), ('u2', 'Bob', '2024-01-15T12:00:00Z'), ('u3', 'Cam', '2024-02-01T09:00:00Z'), ('u4', 'Dan', '2024-02-20T18:00:00Z'), ('u5', 'Eli', '2024-02-28T23:00:00Z')],)

Expected Output: '2024-02'

Explanation: January has 2 signups and February has 3.

Input: ([('u1', 'A', '2024-01-01T07:30:00Z'), ('u2', 'B', '2024-01-31T23:00:00Z'), ('u3', 'C', '2024-02-01T07:30:00Z'), ('u4', 'D', '2024-02-15T12:00:00Z'), ('u5', 'E', '2024-03-01T07:30:00Z')],)

Expected Output: '2024-01'

Explanation: After timezone conversion, counts are 2023-12: 1, 2024-01: 2, 2024-02: 2. The tie is broken by earliest month.

Input: ([],)

Expected Output: ''

Explanation: No users means no winning month.

Input: ([('u1', 'A', '2024-01-01T00:30:00Z')],)

Expected Output: '2023-12'

Explanation: 2024-01-01 00:30 UTC is still 2023-12-31 in Los Angeles.

Hints

  1. Convert each timestamp to local time before truncating to month.
  2. Count signups per month, then apply the tie-break rule by earliest month.

Part 3: Find the User with the Largest Follower Count

You are given a users table and a follows table. Each user row is (user_id, user_name, joined_at_utc). Each follow row is (follower_user_id, followed_user_id, created_at_utc). Return the user_id of the user with the largest number of distinct followers. Only users present in the users table are eligible to win. If multiple users tie, return the lexicographically smallest user_id. If users is empty, return an empty string.

Constraints

  • 0 <= len(users) <= 200000
  • 0 <= len(follows) <= 200000
  • Count distinct follower_user_id values per followed_user_id
  • Timestamps are provided but are not needed for this task

Examples

Input: ([('u1', 'A', '2024-01-01T00:00:00Z'), ('u2', 'B', '2024-01-01T00:00:00Z'), ('u3', 'C', '2024-01-01T00:00:00Z')], [('u2', 'u1', '2024-01-02T00:00:00Z'), ('u3', 'u1', '2024-01-02T00:00:00Z'), ('u1', 'u2', '2024-01-03T00:00:00Z')])

Expected Output: 'u1'

Explanation: u1 has 2 distinct followers, more than any other user.

Input: ([('u1', 'A', '2024-01-01T00:00:00Z'), ('u2', 'B', '2024-01-01T00:00:00Z'), ('u3', 'C', '2024-01-01T00:00:00Z')], [('u2', 'u1', '2024-01-02T00:00:00Z'), ('u3', 'u1', '2024-01-02T00:00:00Z'), ('u1', 'u2', '2024-01-03T00:00:00Z'), ('u3', 'u2', '2024-01-03T00:00:00Z')])

Expected Output: 'u1'

Explanation: u1 and u2 both have 2 followers, so the lexicographically smaller ID wins.

Input: ([('u2', 'B', '2024-01-01T00:00:00Z'), ('u1', 'A', '2024-01-01T00:00:00Z')], [])

Expected Output: 'u1'

Explanation: Both users have 0 followers, so return the smaller user_id.

Input: ([('u1', 'A', '2024-01-01T00:00:00Z'), ('u2', 'B', '2024-01-01T00:00:00Z'), ('u3', 'C', '2024-01-01T00:00:00Z')], [('u2', 'u1', '2024-01-02T00:00:00Z'), ('u2', 'u1', '2024-01-03T00:00:00Z'), ('u3', 'u1', '2024-01-04T00:00:00Z')])

Expected Output: 'u1'

Explanation: Duplicate follows from the same follower count only once because followers are distinct.

Input: ([], [('u1', 'u2', '2024-01-01T00:00:00Z')])

Expected Output: ''

Explanation: No users means no valid winner.

Hints

  1. Build a set of followers for each followed user.
  2. Do not forget users with zero followers; one of them could win if there are no follow rows.

Part 4: Populate Daily Aggregate Tables for Dashboard Metrics

You are given raw app tables as lists of rows. Build two daily aggregate tables over an inclusive reporting date range [start_date, end_date] using the America/Los_Angeles timezone. 1. kpi_daily_base rows must be [report_date, new_users, total_users_eod, dau, new_photos, total_follow_edges_eod] 2. follower_bucket_daily rows must be [report_date, bucket, user_count] for each bucket in the fixed order '0', '1_2', '3_plus' Definitions: - new_users: users whose joined_at falls on that local date - total_users_eod: users whose joined_at is on or before that local date - dau: number of distinct users with at least one event on that local date - new_photos: photos whose created_at falls on that local date - total_follow_edges_eod: follows whose created_at is on or before that local date - follower buckets: among users that exist by end of day, bucket each user by their active follower count using follows created on or before that day Return both aggregate tables in a dictionary with keys 'kpi_daily_base' and 'follower_bucket_daily'.

Constraints

  • 0 <= len(users), len(photos), len(follows), len(user_events) <= 200000
  • start_date <= end_date
  • All timestamps are valid ISO 8601 UTC timestamps ending in 'Z'
  • The number of days in the requested date range is at most 366
  • Assume user IDs in users are unique and follow rows represent active edges that remain active forever

Examples

Input: ([('u1', 'A', '2024-01-01T10:00:00Z'), ('u2', 'B', '2024-01-02T07:30:00Z'), ('u3', 'C', '2024-01-03T08:30:00Z')], [('p1', 'u1', '2024-01-02T06:00:00Z'), ('p2', 'u2', '2024-01-03T09:00:00Z')], [('u2', 'u1', '2024-01-02T08:00:00Z'), ('u3', 'u1', '2024-01-03T18:00:00Z')], [('e1', 'u1', 'open', '2024-01-02T01:00:00Z'), ('e2', 'u1', 'like', '2024-01-02T23:00:00Z'), ('e3', 'u2', 'open', '2024-01-03T07:20:00Z'), ('e4', 'u2', 'open', '2024-01-03T09:00:00Z')], '2024-01-01', '2024-01-03')

Expected Output: {'kpi_daily_base': [['2024-01-01', 2, 2, 1, 1, 0], ['2024-01-02', 0, 2, 2, 0, 1], ['2024-01-03', 1, 3, 1, 1, 2]], 'follower_bucket_daily': [['2024-01-01', '0', 2], ['2024-01-01', '1_2', 0], ['2024-01-01', '3_plus', 0], ['2024-01-02', '0', 1], ['2024-01-02', '1_2', 1], ['2024-01-02', '3_plus', 0], ['2024-01-03', '0', 2], ['2024-01-03', '1_2', 1], ['2024-01-03', '3_plus', 0]]}

Explanation: This covers timezone conversion, DAU deduplication, cumulative users, cumulative follows, and bucket updates.

Input: ([], [], [], [], '2024-05-01', '2024-05-01')

Expected Output: {'kpi_daily_base': [['2024-05-01', 0, 0, 0, 0, 0]], 'follower_bucket_daily': [['2024-05-01', '0', 0], ['2024-05-01', '1_2', 0], ['2024-05-01', '3_plus', 0]]}

Explanation: The date still appears even when all raw tables are empty.

Input: ([('u1', 'A', '2024-02-01T09:00:00Z'), ('u2', 'B', '2024-02-01T10:00:00Z'), ('u3', 'C', '2024-02-01T11:00:00Z'), ('u4', 'D', '2024-02-01T12:00:00Z')], [], [('u2', 'u1', '2024-02-01T20:00:00Z'), ('u3', 'u1', '2024-02-01T21:00:00Z'), ('u4', 'u1', '2024-02-02T18:00:00Z')], [], '2024-02-01', '2024-02-02')

Expected Output: {'kpi_daily_base': [['2024-02-01', 4, 4, 0, 0, 2], ['2024-02-02', 0, 4, 0, 0, 3]], 'follower_bucket_daily': [['2024-02-01', '0', 3], ['2024-02-01', '1_2', 1], ['2024-02-01', '3_plus', 0], ['2024-02-02', '0', 3], ['2024-02-02', '1_2', 0], ['2024-02-02', '3_plus', 1]]}

Explanation: u1 moves from bucket '1_2' to '3_plus' on the second day.

Input: ([('u1', 'A', '2024-01-01T12:00:00Z'), ('u2', 'B', '2024-01-03T12:00:00Z')], [], [('u2', 'u1', '2024-01-02T12:00:00Z')], [], '2024-01-03', '2024-01-04')

Expected Output: {'kpi_daily_base': [['2024-01-03', 1, 2, 0, 0, 1], ['2024-01-04', 0, 2, 0, 0, 1]], 'follower_bucket_daily': [['2024-01-03', '0', 1], ['2024-01-03', '1_2', 1], ['2024-01-03', '3_plus', 0], ['2024-01-04', '0', 1], ['2024-01-04', '1_2', 1], ['2024-01-04', '3_plus', 0]]}

Explanation: Data before start_date must still affect cumulative end-of-day counts.

Hints

  1. Precompute the local date for every raw row first.
  2. A daily sweep with cumulative totals is simpler than recomputing each day from scratch.

Part 5: Compute Week-1 Retention Cohorts

You are given users and user_events raw tables. For each local signup date D in the America/Los_Angeles timezone, compute a cohort row [cohort_date, cohort_size, retained_w1_users]. A user is retained in week 1 if they had at least one event on local days D+7 through D+13 inclusive. Return all cohort rows sorted by cohort_date ascending.

Constraints

  • 0 <= len(users), len(user_events) <= 200000
  • All timestamps are valid ISO 8601 UTC timestamps ending in 'Z'
  • Use the America/Los_Angeles timezone before determining dates
  • A user should be counted at most once in retained_w1_users for their cohort

Examples

Input: ([('u1', 'A', '2024-01-01T18:00:00Z'), ('u2', 'B', '2024-01-02T07:30:00Z'), ('u3', 'C', '2024-01-03T08:30:00Z')], [('e1', 'u1', 'open', '2024-01-08T20:00:00Z'), ('e2', 'u1', 'like', '2024-01-15T20:00:00Z'), ('e3', 'u2', 'open', '2024-01-14T07:30:00Z'), ('e4', 'u3', 'open', '2024-01-10T10:00:00Z')])

Expected Output: [['2024-01-01', 2, 2], ['2024-01-03', 1, 1]]

Explanation: u1 and u2 are both retained for the 2024-01-01 cohort, and u3 is retained for the 2024-01-03 cohort.

Input: ([('u1', 'A', '2024-01-01T12:00:00Z'), ('u2', 'B', '2024-01-01T13:00:00Z')], [('e1', 'u1', 'open', '2024-01-07T20:00:00Z'), ('e2', 'u2', 'open', '2024-01-15T08:30:00Z')])

Expected Output: [['2024-01-01', 2, 0]]

Explanation: One event is too early and the other is too late, so neither user is retained.

Input: ([('u1', 'A', '2024-02-01T18:00:00Z')], [('e1', 'u1', 'open', '2024-02-08T18:00:00Z'), ('e2', 'u1', 'like', '2024-02-09T18:00:00Z')])

Expected Output: [['2024-02-01', 1, 1]]

Explanation: Multiple qualifying events still count as one retained user.

Input: ([], [('e1', 'u1', 'open', '2024-01-08T18:00:00Z')])

Expected Output: []

Explanation: No users means no cohorts.

Hints

  1. Group users by local signup date first.
  2. For each user, storing the set of local event dates is enough; multiple events on the same day do not change retention.
Last updated: Jun 11, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Group Photos and Sort Feed Items - Nextdoor (medium)
  • Simulate Transaction Lock Scheduling - Nextdoor (hard)
  • Build Ranked Feed With Photo Batching - Nextdoor (medium)
  • Merge Weekly Time Intervals - Nextdoor (medium)
  • Merge overlapping weekly time intervals - Nextdoor (medium)