How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

What difficulty level is this interview question?

This is a medium difficulty Statistics & Math question, commonly asked during Technical Screen rounds at WeRide.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at WeRide during technical interviews.

Test Whether Two Samples Come From the Same Distribution

Quick Overview

Compare two samples from continuous, categorical, or multivariate data using visual diagnostics, parametric and nonparametric tests, effect sizes, multiple-testing control, clustering-aware uncertainty, sample-size imbalance handling, and sampling methods for product evaluation.

You have two datasets, sample A and sample B. They might come from two versions of a system, two cities, or two periods of autonomous-driving operations. The variable of interest may be continuous, skewed, heavy-tailed, or categorical, and the two sample sizes may be very different.

Constraints & Assumptions

Clarify whether the goal is comparing means, proportions, full univariate distributions, or multivariate joint distributions.
Pair statistical tests with effect sizes and practical significance.
Account for multiple testing, dependence, clustering, and sample-size imbalance.
Discuss sampling methods for data collection and offline evaluation.

Clarifying Questions to Ask Guidance

What decision will be made from the comparison?
Are observations independent, or are they clustered by user, vehicle, route, city, or time?
Are A and B intended to represent the same target population?
Are we comparing one variable or many features?
Is the concern drift, treatment impact, data quality, or evaluation sampling?

Part 1 - Test Whether Two Samples Match

How would you test whether the two samples come from the same underlying distribution?

What This Part Should Cover Guidance

Visual diagnostics for continuous, categorical, and multivariate data.
Parametric tests for means or variances when assumptions hold.
Nonparametric tests for full distribution differences.
Categorical tests and multivariate drift tests.
Permutation tests as a flexible option.

Part 2 - Interpret Significance And Handle Practical Issues

How would you interpret statistical significance versus practical significance, multiple testing, dependence, and very different sample sizes?

What This Part Should Cover Guidance

Effect sizes, confidence intervals, business thresholds, and tail differences.
Bonferroni or FDR correction for many features.
Clustered bootstrap or mixed models when observations are not independent.
Weighting, subsampling, or target-population checks for size imbalance.

Part 3 - Discuss Sampling Methods

What sampling methods do you know, and when is each appropriate?

What This Part Should Cover Guidance

Simple random, stratified, cluster, systematic, weighted/importance, multistage, reservoir, and convenience sampling.
Which methods are preferred for product or offline evaluation and why.

What a Strong Answer Covers Guidance

Does not claim one universal test solves every distribution-comparison problem.
Chooses tests based on variable type, dimensionality, and the question being asked.
Reports practical impact, not only p-values.
Explains sampling design and target-population representation.

Follow-up Questions Guidance

Why can a t-test miss distribution differences?
When would KS be a poor choice?
How would you compare high-dimensional data?
What if one sample is rush-hour trips and the other is all-day trips?
How would you sample rare safety-critical events?

Quick Overview

Constraints & Assumptions

Clarify whether the goal is comparing means, proportions, full univariate distributions, or multivariate joint distributions.
Pair statistical tests with effect sizes and practical significance.
Account for multiple testing, dependence, clustering, and sample-size imbalance.
Discuss sampling methods for data collection and offline evaluation.

Clarifying Questions to Ask Guidance

What decision will be made from the comparison?
Are observations independent, or are they clustered by user, vehicle, route, city, or time?
Are A and B intended to represent the same target population?
Are we comparing one variable or many features?
Is the concern drift, treatment impact, data quality, or evaluation sampling?

Part 1 - Test Whether Two Samples Match

How would you test whether the two samples come from the same underlying distribution?

What This Part Should Cover Guidance

Visual diagnostics for continuous, categorical, and multivariate data.
Parametric tests for means or variances when assumptions hold.
Nonparametric tests for full distribution differences.
Categorical tests and multivariate drift tests.
Permutation tests as a flexible option.

Part 2 - Interpret Significance And Handle Practical Issues

How would you interpret statistical significance versus practical significance, multiple testing, dependence, and very different sample sizes?

What This Part Should Cover Guidance

Effect sizes, confidence intervals, business thresholds, and tail differences.
Bonferroni or FDR correction for many features.
Clustered bootstrap or mixed models when observations are not independent.
Weighting, subsampling, or target-population checks for size imbalance.

Part 3 - Discuss Sampling Methods

What sampling methods do you know, and when is each appropriate?

What This Part Should Cover Guidance

Simple random, stratified, cluster, systematic, weighted/importance, multistage, reservoir, and convenience sampling.
Which methods are preferred for product or offline evaluation and why.

What a Strong Answer Covers Guidance

Does not claim one universal test solves every distribution-comparison problem.
Chooses tests based on variable type, dimensionality, and the question being asked.
Reports practical impact, not only p-values.
Explains sampling design and target-population representation.

Follow-up Questions Guidance

Why can a t-test miss distribution differences?
When would KS be a poor choice?
How would you compare high-dimensional data?
What if one sample is rush-hour trips and the other is all-day trips?
How would you sample rare safety-critical events?

Test Whether Two Samples Come From the Same Distribution

Quick Overview

Test Whether Two Samples Come From the Same Distribution

Constraints & Assumptions

Clarifying Questions to Ask Guidance

Part 1 - Test Whether Two Samples Match

What This Part Should Cover Guidance

Part 2 - Interpret Significance And Handle Practical Issues

What This Part Should Cover Guidance

Part 3 - Discuss Sampling Methods

What This Part Should Cover Guidance

What a Strong Answer Covers Guidance

Follow-up Questions Guidance

Write your answer

Test Whether Two Samples Come From the Same Distribution

Quick Overview

Test Whether Two Samples Come From the Same Distribution

Constraints & Assumptions

Clarifying Questions to Ask Guidance

Part 1 - Test Whether Two Samples Match

What This Part Should Cover Guidance

Part 2 - Interpret Significance And Handle Practical Issues

What This Part Should Cover Guidance

Part 3 - Discuss Sampling Methods

What This Part Should Cover Guidance

What a Strong Answer Covers Guidance

Follow-up Questions Guidance

Write your answer