This question evaluates a data scientist's ability to design a multi-dimensional ride-quality metric framework and to reason about measurement validity, bias, and confounding when diagnostic ratios like eta_ratio deviate from expectations.
You are interviewing for an autonomous-driving metrics team. The team wants to measure passenger experience for completed rides.
Design a metric framework for ride quality. Your answer should go beyond a single KPI and should cover multiple dimensions such as:
Then consider one specific metric:
eta_ratio = actual_ride_time / baseline_eta
where baseline_eta is the ETA from a third-party navigation app for the same origin, destination, and request time.
Suppose eta_ratio is unexpectedly high, suggesting the autonomous-driving ride is much slower than the baseline. How would you investigate whether this reflects a true product problem versus measurement bias, bad metric design, or confounding? Be explicit about metric definitions, segmentation, possible failure modes, and what additional data you would request.