Measure App Store success and debug funnel anomaly
Company: Shopify
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: easy
Interview Round: Technical Screen
## Part A — Product case: measuring success for a new App Store
Shopify is launching a **Shopify App Store** where merchants can browse/install apps built by third-party developers (some paid, some free). You are the Data Scientist supporting the launch.
### 1) Define success
Propose a success measurement framework with:
- **Primary (north-star) metric(s)**
- **Input/leading metrics** (activation, engagement)
- **Diagnostic metrics** (funnel rates, segment cuts)
- **Guardrails** (latency, merchant churn, refunds/chargebacks, spam/fraud, support burden)
Be explicit about *whose success* you’re optimizing for (merchants, developers, Shopify) and how you’d balance tradeoffs.
### 2) Data + instrumentation
Specify what data you’d need and where it comes from.
- List key **event streams** (e.g., clickstream/browse/search, install/uninstall, subscription/billing, app usage, support tickets).
- Propose a minimal **data model** (example fact/dimension tables) that would support the metrics.
Assume events arrive in near-real-time; define any time windowing (e.g., daily in UTC) and identity rules (merchant_id, app_id, developer_id, session_id).
### 3) Experimentation plan
Design at least one experiment to improve App Store outcomes (e.g., ranking algorithm, pricing surfaces, recommendation modules, onboarding prompts).
Include:
- Unit of randomization (merchant vs session), eligibility, and duration
- Primary/secondary/guardrail metrics
- Key threats to validity (network effects, interference, novelty effects, selection bias)
- How you’d analyze (e.g., CUPED, stratification) and make a ship/no-ship decision
---
## Part B — Data interpretation + visualization: traffic spike with worse funnel
You’re given a dataset with **3 years of daily metrics** for the App Store. You notice:
- A **large traffic spike** that is not explained by normal seasonality.
- **Add-to-cart (ATC) rate drops sharply** during the spike.
- **Conversion rate drops slightly**.
Assume the table below (you may create derived fields like YoY, WoW, and rolling averages):
### Table: `daily_app_store_metrics`
- `date` (DATE)
- `sessions` (INT) — total visits to the App Store
- `product_views` (INT)
- `add_to_cart` (INT)
- `purchases` (INT)
- `revenue` (NUMERIC)
- `channel` (STRING) — e.g., organic, paid_search, email, affiliate, referral
- `device_type` (STRING) — desktop/mobile/tablet
- `geo` (STRING)
- `merchant_tier` (STRING) — e.g., trial/basic/plus
- `landing_page` (STRING)
- `app_category` (STRING)
- `is_bot_suspected` (BOOL) — if available
### Tasks
1) List plausible hypotheses that could cause **sessions ↑** while **ATC rate ↓** and **conversion ↓/flat** (cover both product and data-quality causes).
2) Propose the *most useful charts* you would build (in Python or Google Sheets) to validate/refute your hypotheses.
3) Explain what follow-up data you would request if the dataset is insufficient.
*Define rates as:*
- `ATC_rate = add_to_cart / product_views` (or justify an alternative)
- `CVR = purchases / sessions`
Output expected: a structured investigation plan plus the key visualizations you’d generate.
Quick Answer: This question evaluates proficiency in product analytics, instrumentation design, experimentation planning, and funnel-level diagnostic analysis for an app marketplace, and falls squarely in the Analytics & Experimentation domain for data scientist roles.