How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

What difficulty level is this interview question?

This is a medium difficulty Analytics & Experimentation question, commonly asked during Technical Screen rounds at Amazon.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Amazon during technical interviews.

Reserving an Elevator for Food Deliveries | Amazon Interview Question

Q: Reserving an Elevator for Food Deliveries

This question tests a data scientist's ability to design a rigorous A/B experiment for a real-world operational policy with competing stakeholder outcomes. It evaluates expertise in experiment design concepts including cluster randomization, switchback designs, metric selection, guardrail metrics, and intent-to-treat analysis — core skills assessed in analytics and experimentation interviews.

Reserving an Elevator for Food Deliveries

You are a data scientist working with a building-operations company that manages large residential apartment towers. A building owner has noticed that a large and growing share of elevator trips are made by food-delivery couriers entering the building to drop off orders. These courier trips are short, frequent, and often coincide with peak meal times, and the owner believes they crowd out residents and increase resident wait times.

The owner proposes an intervention: in a tower with 6 elevators, reserve one elevator exclusively for food-delivery couriers, leaving residents 5 general-use elevators. Couriers would be routed (via signage and the delivery-app pickup instructions) to use only the reserved car.

Design an A/B test to evaluate this policy, and define how you would measure success.

Constraints & Assumptions

A tower has 6 elevators; the policy converts 1 of them into a courier-only car (residents keep 5).
The company operates a portfolio of many similar towers (tens to low hundreds), which is what makes a between-building experiment feasible.
Elevator controllers log every call (button press, originating floor, timestamp) and every arrival (car, floor, timestamp), so per-trip wait time is instrumentable.
Food-delivery couriers can be (imperfectly) identified — e.g. via building-entry QR check-in used by delivery apps, or by trip patterns (lobby-to-upper-floor-and-back in a short window).
Compliance is imperfect: some couriers will still take general elevators, and some residents may take the courier car when it is idle.

Clarifying Questions to Ask

What is the precise primary objective — reduce resident wait time, reduce overall crowding/complaints, improve courier throughput, or something else? Whose experience defines "success"?
How many towers are available, and how similar are they in size, floor count, resident count, and delivery volume? Is there a pre-period of logged data for each?
How are couriers identified at the point of an elevator call, and how reliable is that identification? Is compliance enforced or voluntary?
What is the expected effect size the owner would consider worth the operational cost, and what wait-time degradation for couriers (or residents) would make the policy unacceptable?
Are there strong time-of-day and day-of-week patterns (meal-time peaks) that the design must account for?
Is there a risk of spillover or novelty (residents/couriers behaving differently just because the policy is new)?

What a Strong Answer Covers

Metric design : a clearly named primary metric (e.g. resident wait time per call), guardrail metrics (courier wait time, overall throughput, resident complaints/NPS), and counter-metrics that catch harm.
Unit of randomization : recognition that this is a cluster-level (building-level) treatment, not a user-level one, and the consequences for power.
Design choice : a defensible design — cluster-randomized across towers, or a within-building switchback (time-sliced on/off) to control for between-building heterogeneity — with the trade-offs of each.
Sample size / power & MDE : how the cluster count or switchback period length limits the minimum detectable effect, and accounting for within-cluster correlation.
Confounders & validity threats : meal-time seasonality, compliance/non-adherence, spillover, novelty/primacy effects, and how the analysis handles them (e.g. intent-to-treat).
Heterogeneity : peak vs off-peak, tall vs short buildings, high- vs low-delivery towers.
Decision rule : what combination of primary lift and guardrail constraints would lead to a ship / no-ship / iterate decision.

Follow-up Questions

The owner only has 8 towers available and meal-time peaks dominate the signal. Would you still run a between-building test, or switch to a switchback design? Walk through the trade-offs and how you'd analyze each.
Compliance is only ~60% (many couriers ignore the reserved car). How does non-adherence bias your estimate, and how would you report intent-to-treat vs. treatment-on-the-treated effects?
Suppose resident wait time improves but courier delivery times get much worse, raising delivery-app complaints that hurt the building's reputation. How do you frame the trade-off and recommend a decision?

Reserving an Elevator for Food Deliveries

Design an A/B test to evaluate this policy, and define how you would measure success.

Constraints & Assumptions

A tower has 6 elevators; the policy converts 1 of them into a courier-only car (residents keep 5).
The company operates a portfolio of many similar towers (tens to low hundreds), which is what makes a between-building experiment feasible.
Elevator controllers log every call (button press, originating floor, timestamp) and every arrival (car, floor, timestamp), so per-trip wait time is instrumentable.
Food-delivery couriers can be (imperfectly) identified — e.g. via building-entry QR check-in used by delivery apps, or by trip patterns (lobby-to-upper-floor-and-back in a short window).
Compliance is imperfect: some couriers will still take general elevators, and some residents may take the courier car when it is idle.

Clarifying Questions to Ask

What is the precise primary objective — reduce resident wait time, reduce overall crowding/complaints, improve courier throughput, or something else? Whose experience defines "success"?
How many towers are available, and how similar are they in size, floor count, resident count, and delivery volume? Is there a pre-period of logged data for each?
How are couriers identified at the point of an elevator call, and how reliable is that identification? Is compliance enforced or voluntary?
What is the expected effect size the owner would consider worth the operational cost, and what wait-time degradation for couriers (or residents) would make the policy unacceptable?
Are there strong time-of-day and day-of-week patterns (meal-time peaks) that the design must account for?
Is there a risk of spillover or novelty (residents/couriers behaving differently just because the policy is new)?

What a Strong Answer Covers

Metric design : a clearly named primary metric (e.g. resident wait time per call), guardrail metrics (courier wait time, overall throughput, resident complaints/NPS), and counter-metrics that catch harm.
Unit of randomization : recognition that this is a cluster-level (building-level) treatment, not a user-level one, and the consequences for power.
Design choice : a defensible design — cluster-randomized across towers, or a within-building switchback (time-sliced on/off) to control for between-building heterogeneity — with the trade-offs of each.
Sample size / power & MDE : how the cluster count or switchback period length limits the minimum detectable effect, and accounting for within-cluster correlation.
Confounders & validity threats : meal-time seasonality, compliance/non-adherence, spillover, novelty/primacy effects, and how the analysis handles them (e.g. intent-to-treat).
Heterogeneity : peak vs off-peak, tall vs short buildings, high- vs low-delivery towers.
Decision rule : what combination of primary lift and guardrail constraints would lead to a ship / no-ship / iterate decision.

Follow-up Questions

The owner only has 8 towers available and meal-time peaks dominate the signal. Would you still run a between-building test, or switch to a switchback design? Walk through the trade-offs and how you'd analyze each.
Compliance is only ~60% (many couriers ignore the reserved car). How does non-adherence bias your estimate, and how would you report intent-to-treat vs. treatment-on-the-treated effects?
Suppose resident wait time improves but courier delivery times get much worse, raising delivery-app complaints that hurt the building's reputation. How do you frame the trade-off and recommend a decision?

Reserving an Elevator for Food Deliveries

Quick Overview

Reserving an Elevator for Food Deliveries

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Reserving an Elevator for Food Deliveries

Quick Overview

Reserving an Elevator for Food Deliveries

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP