This question evaluates the ability to assess model transferability across domains, detect and characterize distribution shifts, integrate large-source with small-target datasets, quantify uncertainty, and prioritize rare or emergent edge-case data for evaluation.
You have built an autonomous-driving evaluation system using a large amount of labeled data from Beijing. Now the company wants to operate in Guangzhou. You do not want to rebuild the entire evaluation pipeline from scratch, and you can only collect a small amount of Guangzhou data.
How would you evaluate whether the autonomous-driving system is likely to perform well in Guangzhou under this limited-data setting? Discuss:
Follow-up: if Guangzhou contains important scenarios that are rare or absent in Beijing—for example different road topology, scooter density, weather, driving behavior, or map quality—how should that change your data-collection strategy? Be explicit about what to sample and how to prioritize edge cases.