How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

What difficulty level is this interview question?

This is a Medium difficulty Statistics & Math question, commonly asked during Technical Screen rounds at Meta.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Meta during technical interviews.

Compare first-score vs all-scores estimators

Quick Overview

This question evaluates statistical estimation and inference competencies—specifically understanding estimator definitions, weighting and sampling effects, bias risks when response counts correlate with latent outcomes, intracluster correlation, variance estimation and cluster-robust standard errors, and design-effect calculations.

You have two candidate estimators for survey quality based on the score column over 2025-08-26 to 2025-09-01:

E_first: For each user×survey pair, take the first score in-window; then average across pairs.
E_all: Average across all in-window scores (users with more responses contribute more weight).

Answer precisely:

Define the target estimand and write E_first and E_all formally. Show that E_all is a weighted average of per-user means with weights proportional to each user’s in-window response count. Under what conditions are E_first and E_all equal in expectation?
Discuss bias risks for E_all when the number of responses per user correlates with their latent satisfaction. Provide a concrete example where E_all over- or under-estimates the estimand.
Derive large-sample variance estimators for both approaches. For E_all, propose user-cluster-robust standard errors; for E_first, standard IID SEs over user×survey pairs. Explain when cluster-robust SEs are required for E_first.
Suppose repeated scores within a user×survey pair follow an exchangeable correlation with intracluster correlation ρ and average cluster size m. Show how the design effect DE = 1 + (m−1)ρ affects effective sample size and confidence intervals for E_all.
Recommend which estimator to report by default and when to prefer the alternative. Justify using bias–variance tradeoffs and interpretability.

Quick Overview

You have two candidate estimators for survey quality based on the score column over 2025-08-26 to 2025-09-01:

E_first: For each user×survey pair, take the first score in-window; then average across pairs.
E_all: Average across all in-window scores (users with more responses contribute more weight).

Answer precisely:

Define the target estimand and write E_first and E_all formally. Show that E_all is a weighted average of per-user means with weights proportional to each user’s in-window response count. Under what conditions are E_first and E_all equal in expectation?
Discuss bias risks for E_all when the number of responses per user correlates with their latent satisfaction. Provide a concrete example where E_all over- or under-estimates the estimand.
Derive large-sample variance estimators for both approaches. For E_all, propose user-cluster-robust standard errors; for E_first, standard IID SEs over user×survey pairs. Explain when cluster-robust SEs are required for E_first.
Suppose repeated scores within a user×survey pair follow an exchangeable correlation with intracluster correlation ρ and average cluster size m. Show how the design effect DE = 1 + (m−1)ρ affects effective sample size and confidence intervals for E_all.
Recommend which estimator to report by default and when to prefer the alternative. Justify using bias–variance tradeoffs and interpretability.

Compare first-score vs all-scores estimators

Quick Overview

Submit Your Answer to Earn 20XP

Compare first-score vs all-scores estimators

Quick Overview

Submit Your Answer to Earn 20XP