Compare two rare-event detection models statistically | Waymo Interview Question