You are a Data Scientist supporting an e-commerce platform. You receive a weekly time-series dashboard covering the past three years. The dashboard initially shows weekly session count, and there is a large spike around May in the most recent year.
After that, the interviewer adds weekly conversion rate and one additional engagement or quality metric, such as average session duration, bounce rate, or orders per session. Around the same May spike, conversion rate drops sharply.
You are then given week-level data and asked to investigate. The data contains at least the following fields:
-
year_week
: a year-week label that may be stored as a string rather than a true date.
-
week_start_date
: the start date of the week, if available.
-
sessions
: number of sessions in that week.
-
orders
: number of sessions that converted or number of orders.
-
conversion_rate
: orders divided by sessions.
-
shop_type
: merchant or shop category.
-
session_duration_bucket
: duration bucket such as
0
,
0-30 seconds
,
30-60 seconds
, and
60+ seconds
.
Tasks:
-
State hypotheses for why weekly sessions spiked around May.
-
Explain why conversion rate could fall at the same time sessions spike.
-
Describe how you would validate the issue using week-level data, including how you would handle the
year_week
data quality issue.
-
Suppose your analysis shows that the session spike appears only for certain shop types and only for sessions with duration
0
or
0-30 seconds
. Interpret the result and recommend next steps.