This question evaluates proficiency in SQL data manipulation—specifically use of window functions, CTEs, deduplication by earliest timestamp, temporal comparisons, and cohort-level aggregation—within the Data Manipulation (SQL/Python) domain for a Data Analyst role in health‑tech product analytics.
You are interviewing for a health-tech product analytics role.
Assume the following table contains one row per appointment with its final status.
Table: appointments
appointment_id
(BIGINT, primary key)
user_id
(BIGINT, not null)
scheduled_start_at
(TIMESTAMP, not null) — when the appointment was scheduled to start
status
(VARCHAR, not null) — one of:
'cancelled'
,
'returned'
,
'confirmed'
,
'completed'
,
'no_show'
status_updated_at
(TIMESTAMP, not null) — when the final
status
was set
Definitions / assumptions
scheduled_start_at
(break ties by smallest
appointment_id
).
status = 'cancelled'
.
scheduled_start_at
strictly after their first appointment’s
scheduled_start_at
and with
status IN ('returned','confirmed')
.
Task Write a SQL query to compute the percentage of first-cancelled users who never have another future returned/confirmed appointment.
Required output (single row)
first_cancelled_users
(integer)
never_returned_confirmed_users
(integer)
pct_never_returned_confirmed
(decimal between 0 and 1)
(You may use CTEs and window functions.)