Compute early signals for user retention
Company: Chime
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: easy
Interview Round: Technical Screen
You are given two tables that track signups and user transactions.
## Tables
### `users`
- `user_id` (STRING, **PK**) — unique user
- `signup_ts` (TIMESTAMP) — time the user signed up (assume UTC)
### `transactions`
- `transaction_id` (STRING, **PK**)
- `user_id` (STRING, **FK → users.user_id**)
- `transaction_ts` (TIMESTAMP) — time of transaction (UTC)
- `amount` (NUMERIC) — transaction amount
## Task
You want an **early signal** that predicts **Day-30 retention**.
1) Define an **early engagement metric** as the number of transactions a user makes in the first **7 days after signup**:
- `early_txn_cnt` = count of `transactions` where `transaction_ts >= signup_ts` and `transaction_ts < signup_ts + 7 days`.
2) Define **Day-30 retention** as whether the user has **at least one transaction** in the window **[signup + 30 days, signup + 37 days)**.
- `is_retained_d30` ∈ {0,1}
3) Write a SQL query that outputs retention performance by early-engagement bucket:
- Bucket `early_txn_cnt` into: `0`, `1`, `2`, `3+`
## Required output
Return one row per bucket with:
- `early_txn_bucket`
- `users_in_bucket`
- `retained_users`
- `retention_rate` (= `retained_users` / `users_in_bucket`, as a decimal)
## Notes / assumptions
- Include users with zero transactions.
- If a user has transactions before `signup_ts`, ignore them.
- If multiple transactions occur in the retention window, the user still counts as retained once.
Quick Answer: This question evaluates the ability to compute cohort-based early engagement metrics and Day-30 retention using time-windowed joins, aggregation, bucketing, and handling edge cases such as users with zero or pre-signup transactions.