Evaluate Guangzhou performance with limited data

Q: Evaluate Guangzhou performance with limited data

Evaluate autonomous-driving performance transfer from Beijing to Guangzhou with limited local data, covering target population, distribution shift, scenario taxonomy, stratified sampling, importance weighting, partial pooling, uncertainty, rare scenarios, and staged launch decisions.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Q: What difficulty level is this interview question?

This is a medium difficulty Machine Learning question, commonly asked during Technical Screen rounds at WeRide.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at WeRide during technical interviews.

Question

You have built an autonomous-driving evaluation system using a large amount of labeled data from Beijing. The company now wants to operate in Guangzhou. You do not want to rebuild the evaluation pipeline from scratch, and you can collect only a small amount of Guangzhou data.

How would you evaluate whether the autonomous-driving system is likely to perform well in Guangzhou under this limited-data setting?

Constraints & Assumptions

Reuse the Beijing evaluation system where valid, but do not assume transfer automatically.
Treat this as a distribution-shift and limited-label evaluation problem.
Make a go, no-go, or restricted-go recommendation under uncertainty.
Safety-critical tails and uncovered scenarios should be handled separately from average performance.

Clarifying Questions to Ask Guidance

What is the initial Guangzhou launch domain: geofence, route types, time of day, weather, and operating mode?
What metrics and thresholds define acceptable performance?
How much Guangzhou data can be collected and labeled?
What Beijing scenario taxonomy and tooling already exist?
What operational risk is acceptable for a staged launch?

Part 1 - Define Target Population And Success Criteria

What exactly are we evaluating in Guangzhou?

What This Part Should Cover Guidance

Launch domain, route mix, conditions, target population, metrics, and decision thresholds.
Mean, tail, and scenario-specific safety guardrails.

Part 2 - Assess Transfer From Beijing

How would you assess whether Beijing metrics, tooling, calibration, and thresholds transfer?

What This Part Should Cover Guidance

Reuse metric definitions and tooling, but validate calibration and thresholds locally.
Check scenario coverage, support overlap, and whether Beijing labels or failures map to Guangzhou.

Part 3 - Detect Distribution Shift

What shifts would you look for between Beijing and Guangzhou?

What This Part Should Cover Guidance

Covariate shift, label shift, concept shift, and support mismatch.
Feature drift tests, domain classifiers, embedding similarity, taxonomy comparison, and scenario-level coverage.

Part 4 - Use Small Guangzhou Data Efficiently

How would you sample, combine Beijing and Guangzhou data, and quantify uncertainty?

What This Part Should Cover Guidance

Stratified risk-aware sampling, scenario quotas, oversampling rare high-risk cases, per-scenario estimates, reweighting, partial pooling, bootstrap or Bayesian intervals, and conservative bounds.

Part 5 - Handle Guangzhou-Only Rare Scenarios

What changes if Guangzhou contains important scenarios rare or absent in Beijing?

What This Part Should Cover Guidance

Treat this as a support problem, not simple reweighting.
Targeted collection, sentinel routes, simulation/log replay, expert review, and prioritization by exposure, severity, and uncertainty.

What a Strong Answer Covers Guidance

Defines the launch domain before estimating performance.
Reuses prior data only where support overlaps.
Quantifies uncertainty and coverage gaps.
Makes a staged recommendation rather than relying on one citywide average.

Follow-up Questions Guidance

What if Guangzhou has zero observed failures in a tiny sample?
How would you use a domain classifier?
When does reweighting fail?
How would you prioritize scooter-heavy scenes?
What evidence would justify restricted launch?

Evaluate Guangzhou performance with limited data

Quick Overview

Evaluate Guangzhou performance with limited data

Constraints & Assumptions

Clarifying Questions to Ask Guidance

Part 1 - Define Target Population And Success Criteria

What This Part Should Cover Guidance

Part 2 - Assess Transfer From Beijing

What This Part Should Cover Guidance

Part 3 - Detect Distribution Shift

What This Part Should Cover Guidance

Part 4 - Use Small Guangzhou Data Efficiently

What This Part Should Cover Guidance

Part 5 - Handle Guangzhou-Only Rare Scenarios

What This Part Should Cover Guidance

What a Strong Answer Covers Guidance

Follow-up Questions Guidance

Write your answer

Evaluate Guangzhou performance with limited data

Quick Overview

Evaluate Guangzhou performance with limited data

Constraints & Assumptions

Clarifying Questions to Ask Guidance

Part 1 - Define Target Population And Success Criteria

What This Part Should Cover Guidance

Part 2 - Assess Transfer From Beijing

What This Part Should Cover Guidance

Part 3 - Detect Distribution Shift

What This Part Should Cover Guidance

Part 4 - Use Small Guangzhou Data Efficiently

What This Part Should Cover Guidance

Part 5 - Handle Guangzhou-Only Rare Scenarios

What This Part Should Cover Guidance

What a Strong Answer Covers Guidance

Follow-up Questions Guidance

Write your answer