PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Analytics & Experimentation/Amazon

Reserving an Elevator for Food Deliveries

Last updated: Jun 21, 2026

Quick Overview

This question tests a data scientist's ability to design a rigorous A/B experiment for a real-world operational policy with competing stakeholder outcomes. It evaluates expertise in experiment design concepts including cluster randomization, switchback designs, metric selection, guardrail metrics, and intent-to-treat analysis — core skills assessed in analytics and experimentation interviews.

  • medium
  • Amazon
  • Analytics & Experimentation
  • Data Scientist

Reserving an Elevator for Food Deliveries

Company: Amazon

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: medium

Interview Round: Technical Screen

## Reserving an Elevator for Food Deliveries You are a data scientist working with a building-operations company that manages large residential apartment towers. A building owner has noticed that a large and growing share of elevator trips are made by **food-delivery couriers** entering the building to drop off orders. These courier trips are short, frequent, and often coincide with peak meal times, and the owner believes they crowd out residents and increase resident wait times. The owner proposes an intervention: in a tower with **6 elevators**, **reserve one elevator exclusively for food-delivery couriers**, leaving residents 5 general-use elevators. Couriers would be routed (via signage and the delivery-app pickup instructions) to use only the reserved car. Design an A/B test to evaluate this policy, and define how you would measure success. ```hint Where to start The owner's stated goal (reduce resident wait time / crowding) and the obvious cost (residents lose a sixth of general capacity; couriers may be funneled into one slow car) point in opposite directions. Name the **primary metric** that captures the goal and the **guardrail metrics** that capture the costs before you touch the design. ``` ```hint Unit of randomization The treatment is a physical, building-level change — you cannot randomize individual residents within one tower (everyone shares the same 6 elevators). Think about what the smallest independently-treatable unit actually is, and what that implies for sample size and analysis (clustered / switchback designs). ``` ```hint Measuring "success" for two populations There are two distinct populations with potentially opposing outcomes: **residents** and **couriers**. A win for one can be a loss for the other. Decide whose experience is the primary success metric and whose is a guardrail, and define a metric you can actually instrument (e.g. wait time from button-press to car arrival). ``` ### Constraints & Assumptions - A tower has 6 elevators; the policy converts 1 of them into a courier-only car (residents keep 5). - The company operates a portfolio of many similar towers (tens to low hundreds), which is what makes a between-building experiment feasible. - Elevator controllers log every **call** (button press, originating floor, timestamp) and every **arrival** (car, floor, timestamp), so per-trip wait time is instrumentable. - Food-delivery couriers can be (imperfectly) identified — e.g. via building-entry QR check-in used by delivery apps, or by trip patterns (lobby-to-upper-floor-and-back in a short window). - Compliance is imperfect: some couriers will still take general elevators, and some residents may take the courier car when it is idle. ### Clarifying Questions to Ask - What is the precise primary objective — reduce **resident** wait time, reduce overall crowding/complaints, improve courier throughput, or something else? Whose experience defines "success"? - How many towers are available, and how similar are they in size, floor count, resident count, and delivery volume? Is there a pre-period of logged data for each? - How are couriers identified at the point of an elevator call, and how reliable is that identification? Is compliance enforced or voluntary? - What is the expected effect size the owner would consider worth the operational cost, and what wait-time degradation for couriers (or residents) would make the policy unacceptable? - Are there strong time-of-day and day-of-week patterns (meal-time peaks) that the design must account for? - Is there a risk of spillover or novelty (residents/couriers behaving differently just because the policy is new)? ### What a Strong Answer Covers - **Metric design**: a clearly named primary metric (e.g. resident wait time per call), guardrail metrics (courier wait time, overall throughput, resident complaints/NPS), and counter-metrics that catch harm. - **Unit of randomization**: recognition that this is a cluster-level (building-level) treatment, not a user-level one, and the consequences for power. - **Design choice**: a defensible design — cluster-randomized across towers, or a within-building **switchback** (time-sliced on/off) to control for between-building heterogeneity — with the trade-offs of each. - **Sample size / power & MDE**: how the cluster count or switchback period length limits the minimum detectable effect, and accounting for within-cluster correlation. - **Confounders & validity threats**: meal-time seasonality, compliance/non-adherence, spillover, novelty/primacy effects, and how the analysis handles them (e.g. intent-to-treat). - **Heterogeneity**: peak vs off-peak, tall vs short buildings, high- vs low-delivery towers. - **Decision rule**: what combination of primary lift and guardrail constraints would lead to a ship / no-ship / iterate decision. ### Follow-up Questions - The owner only has 8 towers available and meal-time peaks dominate the signal. Would you still run a between-building test, or switch to a switchback design? Walk through the trade-offs and how you'd analyze each. - Compliance is only ~60% (many couriers ignore the reserved car). How does non-adherence bias your estimate, and how would you report intent-to-treat vs. treatment-on-the-treated effects? - Suppose resident wait time improves but courier delivery times get much worse, raising delivery-app complaints that hurt the building's reputation. How do you frame the trade-off and recommend a decision?

Quick Answer: This question tests a data scientist's ability to design a rigorous A/B experiment for a real-world operational policy with competing stakeholder outcomes. It evaluates expertise in experiment design concepts including cluster randomization, switchback designs, metric selection, guardrail metrics, and intent-to-treat analysis — core skills assessed in analytics and experimentation interviews.

Related Interview Questions

  • Explain why CTR rises but CVR unchanged - Amazon (medium)
  • How would you test a price increase? - Amazon (medium)
  • How to evaluate adding video ads in a game - Amazon (easy)
  • How would you analyze and test a price increase? - Amazon (easy)
  • How would you evaluate adding video ads? - Amazon (medium)
Amazon logo
Amazon
Jun 8, 2026, 12:00 AM
Data Scientist
Technical Screen
Analytics & Experimentation
0
0

Reserving an Elevator for Food Deliveries

You are a data scientist working with a building-operations company that manages large residential apartment towers. A building owner has noticed that a large and growing share of elevator trips are made by food-delivery couriers entering the building to drop off orders. These courier trips are short, frequent, and often coincide with peak meal times, and the owner believes they crowd out residents and increase resident wait times.

The owner proposes an intervention: in a tower with 6 elevators, reserve one elevator exclusively for food-delivery couriers, leaving residents 5 general-use elevators. Couriers would be routed (via signage and the delivery-app pickup instructions) to use only the reserved car.

Design an A/B test to evaluate this policy, and define how you would measure success.

Constraints & Assumptions

  • A tower has 6 elevators; the policy converts 1 of them into a courier-only car (residents keep 5).
  • The company operates a portfolio of many similar towers (tens to low hundreds), which is what makes a between-building experiment feasible.
  • Elevator controllers log every call (button press, originating floor, timestamp) and every arrival (car, floor, timestamp), so per-trip wait time is instrumentable.
  • Food-delivery couriers can be (imperfectly) identified — e.g. via building-entry QR check-in used by delivery apps, or by trip patterns (lobby-to-upper-floor-and-back in a short window).
  • Compliance is imperfect: some couriers will still take general elevators, and some residents may take the courier car when it is idle.

Clarifying Questions to Ask

  • What is the precise primary objective — reduce resident wait time, reduce overall crowding/complaints, improve courier throughput, or something else? Whose experience defines "success"?
  • How many towers are available, and how similar are they in size, floor count, resident count, and delivery volume? Is there a pre-period of logged data for each?
  • How are couriers identified at the point of an elevator call, and how reliable is that identification? Is compliance enforced or voluntary?
  • What is the expected effect size the owner would consider worth the operational cost, and what wait-time degradation for couriers (or residents) would make the policy unacceptable?
  • Are there strong time-of-day and day-of-week patterns (meal-time peaks) that the design must account for?
  • Is there a risk of spillover or novelty (residents/couriers behaving differently just because the policy is new)?

What a Strong Answer Covers

  • Metric design : a clearly named primary metric (e.g. resident wait time per call), guardrail metrics (courier wait time, overall throughput, resident complaints/NPS), and counter-metrics that catch harm.
  • Unit of randomization : recognition that this is a cluster-level (building-level) treatment, not a user-level one, and the consequences for power.
  • Design choice : a defensible design — cluster-randomized across towers, or a within-building switchback (time-sliced on/off) to control for between-building heterogeneity — with the trade-offs of each.
  • Sample size / power & MDE : how the cluster count or switchback period length limits the minimum detectable effect, and accounting for within-cluster correlation.
  • Confounders & validity threats : meal-time seasonality, compliance/non-adherence, spillover, novelty/primacy effects, and how the analysis handles them (e.g. intent-to-treat).
  • Heterogeneity : peak vs off-peak, tall vs short buildings, high- vs low-delivery towers.
  • Decision rule : what combination of primary lift and guardrail constraints would lead to a ship / no-ship / iterate decision.

Follow-up Questions

  • The owner only has 8 towers available and meal-time peaks dominate the signal. Would you still run a between-building test, or switch to a switchback design? Walk through the trade-offs and how you'd analyze each.
  • Compliance is only ~60% (many couriers ignore the reserved car). How does non-adherence bias your estimate, and how would you report intent-to-treat vs. treatment-on-the-treated effects?
  • Suppose resident wait time improves but courier delivery times get much worse, raising delivery-app complaints that hurt the building's reputation. How do you frame the trade-off and recommend a decision?

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Analytics & Experimentation•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Analytics & Experimentation•Data Scientist Analytics & Experimentation
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.