Reduce airport ride cancellations under causal constraints
Company: PayPal
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: easy
Interview Round: Onsite
##### Question
You are a Data Scientist supporting an **airport rides / airport pickups** team at a ride-hailing marketplace. Airport pickups are operationally different from city pickups, and **cancellation rates are high**, hurting marketplace efficiency and the experience of both riders and drivers.
Key context that makes this hard:
- Drivers usually enter an **airport queue** (FIFO / priority rules). A driver or rider cancellation can be especially costly: the driver may lose their queue position and have to leave the holding lot and re-enter.
- Riders at airports are a special segment: navigation to pickup zones is confusing, many are **one-time users**, and trip intent (business vs personal, luggage, group) varies.
- There are likely **network effects / interference**: changing dispatch, pricing, or guidance for some users affects others waiting in the same shared queue (SUTVA is violated).
- Standard experimentation is hard: geo tests are difficult (few airports, spillover), switchbacks may be operationally risky, and diff-in-diff is biased by strong **time-varying confounding** (flight arrival waves, weather, events, seasonality).
**Goal:** reduce the airport trip cancellation rate **without harming marketplace health**.
Assume access to standard marketplace logs (trip lifecycle events, dispatch events, queue position changes, app events), driver/rider attributes, and airport / terminal / pickup-zone identifiers.
Answer the following:
1. **Problem framing & metrics.** Define **supply**, **demand**, and **marketplace health** for airport pickups. Propose a **primary metric** (or small set) for "reduce cancellations" plus **diagnostic** and **guardrail** metrics. Be explicit about definitions (what counts as a cancellation, by whom, time windows). Discuss **segmentation** (business vs personal, first-time vs repeat, terminal / pickup-zone complexity) and how it guards against Simpson's paradox.
2. **Hypotheses.** Generate plausible, testable hypotheses for why cancellations happen at airports — for **(a) drivers** and **(b) riders** — and include at least one **behavioral / psychology** hypothesis. For each, name the signals you would look for.
3. **Causal identification / experiment design.** Given interference from the shared queue, time-varying confounding, and a limited number of airports (you cannot turn the feature off for an entire region), propose one or more practical designs to estimate the causal impact of an intervention. Address how each handles interference, time variation, and operational constraints. Cover the spectrum from randomized to quasi-experimental approaches.
4. **Satisfaction measurement, data needs, biases, and communication.** Propose **data-driven proxy metrics for driver and rider satisfaction** specific to airports, and explain their limitations. List the **data you would need** and the **main confounders / biases** you would worry about. Explain how you would **communicate the tradeoffs** of an intervention to stakeholders.
Quick Answer: A PayPal Data Scientist onsite product-sense and causal-inference case: design a measurement framework, hypotheses, and experimentation strategy to reduce airport ride cancellations in a ride-hailing marketplace without harming marketplace health. It tests metric trees, interference-aware experiment design (switchback, partial-population, encouragement/IV), satisfaction proxies, confounder handling, and stakeholder communication.