You have abundant labeled autonomous-driving data from Beijing and have already built an evaluation system there. Now the company wants to assess performance in Guangzhou, but does not want to rebuild the evaluation framework from scratch. You are allowed to collect only a small amount of Guangzhou data.
How would you evaluate whether the autonomous-driving system is likely to perform well in Guangzhou?
Your answer should address:
-
how to define the target population and success criteria;
-
how to detect distribution shift between Beijing and Guangzhou;
-
how to use a small Guangzhou sample efficiently;
-
how to estimate uncertainty and make a go or no-go recommendation.
Follow-up: suppose Guangzhou contains scenario types that differ materially from Beijing, for example road topology, weather, traffic-agent mix, signage, or local driving behavior. How should that change your data-collection strategy?