Compute each member’s current notification status
Company: LinkedIn
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: easy
Interview Round: Technical Screen
##### Question
You are given two tables describing LinkedIn members’ push-notification settings. Compute each member’s **current notification status as of `2020-02-29`** by combining the end-of-January baseline with the toggle actions recorded during February.
### Tables
`status` — each member’s **latest setting as of `2020-01-31`**
- `member_id` INT
- `status` VARCHAR — `'on'` or `'off'`
`actions` — all notification-toggle actions taken during **February 2020**
- `member_id` INT
- `action_date` DATE — a UTC calendar date between `2020-02-01` and `2020-02-29`
- `action` VARCHAR — `'turn_on'` or `'turn_off'`
### Assumptions
- A member performs **at most one action per day**.
- Every `actions` row occurs **after** the `2020-01-31` snapshot.
- `actions` may include **new members not present in `status`** (e.g., a member created after `2020-01-31`).
### Tasks
1. Return each member’s **current notification status as of `2020-02-29`**.
- Include **every member who appears in either table** — members only in `status`, members only in `actions`, and members in both.
- If a member has at least one February action, their status is determined by their **latest** February action (`turn_on` → `'on'`, `turn_off` → `'off'`).
- If a member has no February action, their status remains the `status` value from `2020-01-31`.
- Output columns: `member_id`, `current_status`.
2. **Follow-up A — no date column.** Suppose `actions` no longer contains `action_date`, so you cannot order the actions. Explain whether the final status can still be reconstructed, and specify what additional data, assumptions, or table design would be required to make the problem solvable.
3. **Follow-up B — no date column, valid-transition guarantee.** Now add the assumption that every recorded action is a **valid state transition** from the member’s previous state (a `turn_on` only ever follows an `'off'` state and vice versa, so no contradictory or repeated actions occur). Under that guarantee, describe (or write SQL for) how you would infer the final status as of `2020-02-29` using only the snapshot plus the **multiset** of February actions for each member.
Quick Answer: A LinkedIn data scientist SQL screen: given an end-of-January notification-status snapshot and a February log of turn_on/turn_off actions, compute each member's current status as of 2020-02-29, keeping members from either table. Follow-ups explore reconstructing state when the action log has no date column — unsolvable in general, but recoverable via action-count parity when every action is a guaranteed valid transition.
Solution
### Task 1 — current status as of 2020-02-29
The core idea: for each member, take the **last February action** if any exists; otherwise fall back to the January-31 snapshot. Because we must include members in either table, build the candidate-member universe with a `FULL OUTER JOIN` (or a `UNION` of ids) and then resolve status.
**Approach with a window function** (pick the latest action per member, then coalesce with the snapshot):
```sql
WITH latest_action AS (
SELECT
member_id,
action,
ROW_NUMBER() OVER (
PARTITION BY member_id
ORDER BY action_date DESC
) AS rn
FROM actions
),
last_per_member AS (
SELECT member_id, action
FROM latest_action
WHERE rn = 1
)
SELECT
COALESCE(s.member_id, l.member_id) AS member_id,
COALESCE(
CASE l.action
WHEN 'turn_on' THEN 'on'
WHEN 'turn_off' THEN 'off'
END,
s.status
) AS current_status
FROM status s
FULL OUTER JOIN last_per_member l
ON s.member_id = l.member_id;
```
- The `FULL OUTER JOIN` keeps members who appear only in `status` (no February action → `current_status = s.status`) and members who appear only in `actions` (new members → `current_status` comes from their latest action).
- `COALESCE(s.member_id, l.member_id)` produces a non-null `member_id` regardless of which side matched.
- `CASE` maps the verb-style action (`turn_on`/`turn_off`) to the state value (`on`/`off`), and the outer `COALESCE` falls back to the snapshot when there is no February action.
**Equivalent without a window function** (correlated max-date, useful if the dialect lacks `ROW_NUMBER` or for an easy variant):
```sql
WITH last_per_member AS (
SELECT a.member_id,
a.action
FROM actions a
JOIN (
SELECT member_id, MAX(action_date) AS max_date
FROM actions
GROUP BY member_id
) m
ON a.member_id = m.member_id
AND a.action_date = m.max_date
)
SELECT
COALESCE(s.member_id, l.member_id) AS member_id,
COALESCE(
CASE l.action WHEN 'turn_on' THEN 'on' WHEN 'turn_off' THEN 'off' END,
s.status
) AS current_status
FROM status s
FULL OUTER JOIN last_per_member l ON s.member_id = l.member_id;
```
The “at most one action per day” assumption guarantees `MAX(action_date)` is unambiguous — there is exactly one action on the latest day, so no tie-break is needed.
**Python / pandas equivalent:**
```python
# latest action per member
last = (actions.sort_values('action_date')
.groupby('member_id', as_index=False)
.last()[['member_id', 'action']])
last['act_status'] = last['action'].map({'turn_on': 'on', 'turn_off': 'off'})
merged = status.merge(last[['member_id', 'act_status']],
on='member_id', how='outer')
merged['current_status'] = merged['act_status'].fillna(merged['status'])
result = merged[['member_id', 'current_status']]
```
### Follow-up A — no `action_date`
Without a date (or any ordering key) and without further assumptions, the problem is **not solvable in general**. If a member toggled `on` then `off` (or vice versa) within February, the unordered set `{turn_on, turn_off}` is consistent with **both** final states — you cannot tell which came last. To make it solvable you need one of:
- An **ordering key**: the original `action_date`, a `created_at` timestamp, or a monotonically increasing `action_id` / sequence number you can sort by; OR
- A **per-member action count plus a parity rule** — only valid under an extra guarantee (see Follow-up B); OR
- A redesigned table that stores the **resulting state** (e.g., `new_status`) on each row instead of just the verb, so the latest row’s state is self-describing (still needs an ordering key to identify “latest”).
The key teaching point: an event log without an ordering attribute loses the information required to reconstruct state when transitions can repeat or cancel out.
### Follow-up B — no `action_date`, but every action is a valid transition
If every recorded action is a valid transition (a member only fires `turn_on` while currently `off` and `turn_off` while currently `on`), then the actions for a member must **strictly alternate** starting from their baseline state. The order no longer matters — only the **count of actions** does, because each action flips the state exactly once. So the final status is the baseline state flipped once for every February action, i.e. determined by the **parity** of the action count:
- Even number of actions (including zero) → state unchanged from the snapshot.
- Odd number of actions → state flipped from the snapshot.
For members with no snapshot row (new members appearing only in `actions`), their implicit starting state is `'off'` (notifications are off until first turned on), so an odd count → `'on'`, even count → `'off'`. (If the problem instead guarantees a new member’s first action is always `turn_on`, you can read the starting state directly from that.)
```sql
WITH cnt AS (
SELECT member_id, COUNT(*) AS n
FROM actions
GROUP BY member_id
)
SELECT
COALESCE(s.member_id, c.member_id) AS member_id,
CASE
WHEN (COALESCE(c.n, 0) % 2) = 1
THEN CASE COALESCE(s.status, 'off') -- flip the baseline
WHEN 'on' THEN 'off' ELSE 'on'
END
ELSE COALESCE(s.status, 'off') -- even count: unchanged
END AS current_status
FROM status s
FULL OUTER JOIN cnt c ON s.member_id = c.member_id;
```
This works precisely because the valid-transition guarantee removes the ordering ambiguity from Follow-up A: alternating flips are commutative, so the multiset (here just its size) is sufficient.
Explanation
Core task tests a snapshot-plus-event-log state reconstruction: pick each member’s latest February action (window function or correlated MAX-date) and COALESCE with the Jan-31 baseline, using a FULL OUTER JOIN so members in either table are kept. Follow-up A probes the candidate’s grasp that an unordered event log cannot reconstruct state when transitions can cancel out — you need an ordering key or a state-carrying schema. Follow-up B is the elegant special case: with a valid-alternating-transition guarantee, order is irrelevant and parity of the action count determines whether the baseline is flipped.