You are given healthcare claims data split across member tables.
Tables
Assume the following schemas (types may be adapted to your SQL dialect):
mem1
-
member_id
INT PRIMARY KEY
-
age_band
VARCHAR
mem2
-
member_id
INT PRIMARY KEY
-
age_band
VARCHAR
claim
-
claim_id
INT PRIMARY KEY
-
member_id
INT
-
specialty
VARCHAR
-
paid_amt
DECIMAL(12,2)
-
claim_date
DATE
risk (provided but not necessarily needed)
-
member_id
INT
-
risk_score
DECIMAL(10,4)
Assume claim.member_id joins to either mem1.member_id or mem2.member_id.
Assume claim_date is in UTC and “year = 2017” means calendar year 2017.
Part A (SQL)
For each specialty, compute:
-
total_paid_amt
= total paid amount for that specialty
-
paid_share
= that specialty’s share of total paid amount across all claims
Output columns: specialty, total_paid_amt, paid_share.
Part B (SQL)
Considering only claims in calendar year 2017:
-
Union
mem1
and
mem2
into a single member set.
-
Join members to claims.
-
Compute claim counts per
age_band
.
-
Return the age band(s) with the maximum number of claims (keep ties).
Output columns: age_band, claim_cnt.
Part C (Python / Pandas)
You are given a Pandas DataFrame df_claims with columns:
-
paid_amt
(numeric)
-
gender
(values are
'M'
or
'F'
)
-
claim_timestamp
(string or datetime-like)
Tasks:
-
Recode
gender
from
M/F
to
male/female
.
-
Compute total
paid_amt
for
each month in 2017
, grouped by
gender
and month.
Output: a DataFrame with columns like gender, month (1–12), total_paid_amt (and optionally year if you keep it).