##### Question You are a Data Scientist supporting an **airport rides / airport pickups** team at a ride-hailing marketplace. Airport pickups are operationally different from city pickups, and **cancellation rates are high**, hurting marketplace efficiency and the experience of both riders and drivers. Key context that makes this hard: - Drivers usually enter an **airport queue** (FIFO / priority rules). A driver or rider cancellation can be especially costly: the driver may lose their queue position and have to leave the holding lot and re-enter. - Riders at airports are a special segment: navigation to pickup zones is confusing, many are **one-time users**, and trip intent (business vs personal, luggage, group) varies. - There are likely **network effects / interference**: changing dispatch, pricing, or guidance for some users affects others waiting in the same shared queue (SUTVA is violated). - Standard experimentation is hard: geo tests are difficult (few airports, spillover), switchbacks may be operationally risky, and diff-in-diff is biased by strong **time-varying confounding** (flight arrival waves, weather, events, seasonality). **Goal:** reduce the airport trip cancellation rate **without harming marketplace health**. Assume access to standard marketplace logs (trip lifecycle events, dispatch events, queue position changes, app events), driver/rider attributes, and airport / terminal / pickup-zone identifiers. Answer the following: 1. **Problem framing & metrics.** Define **supply**, **demand**, and **marketplace health** for airport pickups. Propose a **primary metric** (or small set) for "reduce cancellations" plus **diagnostic** and **guardrail** metrics. Be explicit about definitions (what counts as a cancellation, by whom, time windows). Discuss **segmentation** (business vs personal, first-time vs repeat, terminal / pickup-zone complexity) and how it guards against Simpson's paradox. 2. **Hypotheses.** Generate plausible, testable hypotheses for why cancellations happen at airports — for **(a) drivers** and **(b) riders** — and include at least one **behavioral / psychology** hypothesis. For each, name the signals you would look for. 3. **Causal identification / experiment design.** Given interference from the shared queue, time-varying confounding, and a limited number of airports (you cannot turn the feature off for an entire region), propose one or more practical designs to estimate the causal impact of an intervention. Address how each handles interference, time variation, and operational constraints. Cover the spectrum from randomized to quasi-experimental approaches. 4. **Satisfaction measurement, data needs, biases, and communication.** Propose **data-driven proxy metrics for driver and rider satisfaction** specific to airports, and explain their limitations. List the **data you would need** and the **main confounders / biases** you would worry about. Explain how you would **communicate the tradeoffs** of an intervention to stakeholders.

A PayPal Data Scientist onsite product-sense and causal-inference case: design a measurement framework, hypotheses, and experimentation strategy to reduce airport ride cancellations in a ride-hailing marketplace without harming marketplace health. It tests metric trees, interference-aware experiment design (switchback, partial-population, encouragement/IV), satisfaction proxies, confounder handling, and stakeholder communication.

How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

What difficulty level is this interview question?

This is a easy difficulty Analytics & Experimentation question, commonly asked during Onsite rounds at PayPal.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at PayPal during technical interviews.

Reduce airport ride cancellations under causal constraints

Question

You are a Data Scientist supporting an airport rides / airport pickups team at a ride-hailing marketplace. Airport pickups are operationally different from city pickups, and cancellation rates are high, hurting marketplace efficiency and the experience of both riders and drivers.

Key context that makes this hard:

Drivers usually enter an airport queue (FIFO / priority rules). A driver or rider cancellation can be especially costly: the driver may lose their queue position and have to leave the holding lot and re-enter.
Riders at airports are a special segment: navigation to pickup zones is confusing, many are one-time users , and trip intent (business vs personal, luggage, group) varies.
There are likely network effects / interference : changing dispatch, pricing, or guidance for some users affects others waiting in the same shared queue (SUTVA is violated).
Standard experimentation is hard: geo tests are difficult (few airports, spillover), switchbacks may be operationally risky, and diff-in-diff is biased by strong time-varying confounding (flight arrival waves, weather, events, seasonality).

Goal: reduce the airport trip cancellation rate without harming marketplace health.

Assume access to standard marketplace logs (trip lifecycle events, dispatch events, queue position changes, app events), driver/rider attributes, and airport / terminal / pickup-zone identifiers.

Answer the following:

Problem framing & metrics. Define supply , demand , and marketplace health for airport pickups. Propose a primary metric (or small set) for "reduce cancellations" plus diagnostic and guardrail metrics. Be explicit about definitions (what counts as a cancellation, by whom, time windows). Discuss segmentation (business vs personal, first-time vs repeat, terminal / pickup-zone complexity) and how it guards against Simpson's paradox.
Hypotheses. Generate plausible, testable hypotheses for why cancellations happen at airports — for (a) drivers and (b) riders — and include at least one behavioral / psychology hypothesis. For each, name the signals you would look for.
Causal identification / experiment design. Given interference from the shared queue, time-varying confounding, and a limited number of airports (you cannot turn the feature off for an entire region), propose one or more practical designs to estimate the causal impact of an intervention. Address how each handles interference, time variation, and operational constraints. Cover the spectrum from randomized to quasi-experimental approaches.
Satisfaction measurement, data needs, biases, and communication. Propose data-driven proxy metrics for driver and rider satisfaction specific to airports, and explain their limitations. List the data you would need and the main confounders / biases you would worry about. Explain how you would communicate the tradeoffs of an intervention to stakeholders.

Question

Key context that makes this hard:

Drivers usually enter an airport queue (FIFO / priority rules). A driver or rider cancellation can be especially costly: the driver may lose their queue position and have to leave the holding lot and re-enter.
Riders at airports are a special segment: navigation to pickup zones is confusing, many are one-time users , and trip intent (business vs personal, luggage, group) varies.
There are likely network effects / interference : changing dispatch, pricing, or guidance for some users affects others waiting in the same shared queue (SUTVA is violated).
Standard experimentation is hard: geo tests are difficult (few airports, spillover), switchbacks may be operationally risky, and diff-in-diff is biased by strong time-varying confounding (flight arrival waves, weather, events, seasonality).

Goal: reduce the airport trip cancellation rate without harming marketplace health.

Assume access to standard marketplace logs (trip lifecycle events, dispatch events, queue position changes, app events), driver/rider attributes, and airport / terminal / pickup-zone identifiers.

Answer the following:

Problem framing & metrics. Define supply , demand , and marketplace health for airport pickups. Propose a primary metric (or small set) for "reduce cancellations" plus diagnostic and guardrail metrics. Be explicit about definitions (what counts as a cancellation, by whom, time windows). Discuss segmentation (business vs personal, first-time vs repeat, terminal / pickup-zone complexity) and how it guards against Simpson's paradox.
Hypotheses. Generate plausible, testable hypotheses for why cancellations happen at airports — for (a) drivers and (b) riders — and include at least one behavioral / psychology hypothesis. For each, name the signals you would look for.
Causal identification / experiment design. Given interference from the shared queue, time-varying confounding, and a limited number of airports (you cannot turn the feature off for an entire region), propose one or more practical designs to estimate the causal impact of an intervention. Address how each handles interference, time variation, and operational constraints. Cover the spectrum from randomized to quasi-experimental approaches.
Satisfaction measurement, data needs, biases, and communication. Propose data-driven proxy metrics for driver and rider satisfaction specific to airports, and explain their limitations. List the data you would need and the main confounders / biases you would worry about. Explain how you would communicate the tradeoffs of an intervention to stakeholders.

Reduce airport ride cancellations under causal constraints

Quick Overview

Question

Solution

Submit Your Answer to Earn 20XP

Reduce airport ride cancellations under causal constraints

Quick Overview

Question

Solution

Submit Your Answer to Earn 20XP