PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Statistics & Math/Meta

Compute and correct correlation significance inflation

Last updated: Mar 29, 2026

Quick Overview

This question evaluates statistical inference for correlations, multiple testing control (false discovery rate), power and sample-size calculations, and reasoning about confounding and Simpson’s paradox using covariance and partial-correlation concepts.

  • medium
  • Meta
  • Statistics & Math
  • Data Scientist

Compute and correct correlation significance inflation

Company: Meta

Role: Data Scientist

Category: Statistics & Math

Difficulty: medium

Interview Round: Technical Screen

You computed correlations for sales outreach analysis. Answer the following with formulas and numerical results where possible. (a) For n = 3200 deals, Pearson r between call_count in the first 14 days post-creation and is_won is 0.23. Using Fisher's z-transform, compute the 95% CI for r and the two-sided p-value. Show intermediate z, SE, and back-transform steps. (b) You tested m = 24 correlations (different channels/time windows). The sorted p-values are: [0.0004, 0.0010, 0.0040, 0.0090, 0.0120, 0.0190, 0.0260, 0.0310, 0.0410, 0.0530, 0.0610, 0.0740, 0.0810, 0.0940, 0.1100, 0.1300, 0.1700, 0.2100, 0.2700, 0.3400, 0.4100, 0.5500, 0.6800, 0.7900]. Apply Benjamini–Hochberg at q = 0.10 and state which hypotheses you reject, showing the threshold comparison i*(q/m). (c) What is the minimal detectable correlation (two-sided, alpha = 0.05, power = 0.80) for n = 500 using Fisher's z power approximation? Provide the formula and numeric answer. (d) You observe overall corr(discount_rate, is_won) = -0.10, but within each region {East, West, Central} the correlations are {+0.05, +0.04, +0.03}. Explain, with equations, how region-mix imbalance can yield this Simpson’s paradox and how to diagnose it numerically (e.g., weighted covariance decomposition and partial correlation controlling for region).

Quick Answer: This question evaluates statistical inference for correlations, multiple testing control (false discovery rate), power and sample-size calculations, and reasoning about confounding and Simpson’s paradox using covariance and partial-correlation concepts.

Related Interview Questions

  • Compute probability an account is fake - Meta (easy)
  • Compute Bayes probability for fake accounts - Meta (easy)
  • Compute probabilities for chatbot response quality - Meta (easy)
  • Compute posterior fake probability using Bayes' rule - Meta (medium)
  • Estimate bots and CI from DAU spike - Meta (medium)
Meta logo
Meta
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Statistics & Math
1
0
Loading...

Sales Outreach Correlation Analysis: Inference, Multiple Testing, Power, and Simpson’s Paradox

Context

You are analyzing sales data to understand relationships between outreach actions and deal outcomes. Below, compute inferential statistics for a correlation, control the false discovery rate across multiple tests, estimate detectable effect size for a study design, and explain a Simpson’s paradox scenario using equations.

Tasks

(a) For n = 3200 deals, the Pearson correlation between call_count in the first 14 days and is_won is r = 0.23. Using Fisher's z-transform, compute the 95% confidence interval for r and the two-sided p-value. Show intermediate Fisher z, standard error (SE), z-interval, and back-transform steps.

(b) You tested m = 24 correlations (different channels/time windows). Sorted p-values are: [0.0004, 0.0010, 0.0040, 0.0090, 0.0120, 0.0190, 0.0260, 0.0310, 0.0410, 0.0530, 0.0610, 0.0740, 0.0810, 0.0940, 0.1100, 0.1300, 0.1700, 0.2100, 0.2700, 0.3400, 0.4100, 0.5500, 0.6800, 0.7900]. Apply the Benjamini–Hochberg procedure at q = 0.10 and state which hypotheses you reject, showing the thresholds i × (q/m).

(c) What is the minimal detectable correlation (two-sided, α = 0.05, power = 0.80) for n = 500 using the Fisher z power approximation? Provide the formula and numeric result.

(d) You observe overall corr(discount_rate, is_won) = −0.10, but within each region {East, West, Central} the correlations are {+0.05, +0.04, +0.03}. Explain, with equations, how region-mix imbalance can yield this Simpson’s paradox and how to diagnose it numerically (e.g., weighted covariance decomposition and partial correlation controlling for region).

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Statistics & Math•More Meta•More Data Scientist•Meta Data Scientist•Meta Statistics & Math•Data Scientist Statistics & Math
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.