Implement Python Function for Statistical Test P-Values

Q: Implement Python Function for Statistical Test P-Values

This question evaluates proficiency in statistical hypothesis testing, p-value interpretation, and implementing distribution-based calculations (Z and Student-t) in code, and it falls under the Coding & Algorithms domain for data scientist roles.

Q: How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

Q: What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Roblox.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Roblox during technical interviews.

Question

##### Scenario

You need a utility that calculates p-values for one-sided and two-sided statistical tests.

##### Question

Write a Python function `compute_p_value(stat, dist='z', df=None, alternative='two-sided')` that returns the p-value. Your code should support Z-tests and Student-t tests, and handle 'less', 'greater', and 'two-sided' alternatives.

##### Hints

Use the CDF of the chosen distribution; for two-sided tests return 2*min(CDF, 1-CDF). Libraries like scipy.stats are allowed.

PracHub · Accepted Answer

def compute_p_value(stat, dist='z', df=None, alternative='two-sided'): import math def _norm_cdf(z): if z == math.inf: return 1.0 if z == -math.inf: return 0.0 return 0.5 * (1.0 + math.erf(z / math.sqrt(2.0))) def _betacf(a, b, x): MAXIT = 200 EPS = 3e-14 FPMIN = 1e-300 qab = a + b qap = a + 1.0 qam = a - 1.0 c = 1.0 d = 1.0 - qab * x / qap if abs(d) < FPMIN: d = FPMIN d = 1.0 / d h = d for m in range(1, MAXIT + 1): m2 = 2 * m aa = m * (b - m) * x / ((qam + m2) * (a + m2)) d = 1.0 + aa * d if abs(d) < FPMIN: d = FPMIN c = 1.0 + aa / c if abs(c) < FPMIN: c = FPMIN d = 1.0 / d h *= d * c aa = -(a + m) * (qab + m) * x / ((a + m2) * (qap + m2)) d = 1.0 + aa * d if abs(d) < FPMIN: d = FPMIN c = 1.0 + aa / c if abs(c) < FPMIN: c = FPMIN d = 1.0 / d delh = d * c h *= delh if abs(delh - 1.0) < EPS: break return h def _betainc_reg(a, b, x): if x <= 0.0: return 0.0 if x >= 1.0: return 1.0 ln_bt = math.lgamma(a + b) - math.lgamma(a) - math.lgamma(b) + a * math.log(x) + b * math.log(1.0 - x) bt = math.exp(ln_bt) if x < (a + 1.0) / (a + b + 2.0): return bt * _betacf(a, b, x) / a else: return 1.0 - bt * _betacf(b, a, 1.0 - x) / b def _t_cdf(t, nu): if not math.isfinite(t): return 1.0 if t > 0 else 0.0 x = nu / (nu + t * t) a = nu / 2.0 b = 0.5 ib = _betainc_reg(a, b, x) if t >= 0: return 1.0 - 0.5 * ib else: return 0.5 * ib d = (dist or 'z').lower() alt = (alternative or 'two-sided').lower().replace('_', '-') if d not in ('z', 't'): raise ValueError('dist must be "z" or "t"') if alt not in ('less', 'greater', 'two-sided'): raise ValueError('alternative must be "less", "greater", or "two-sided"') if d == 'z': F = _norm_cdf(float(stat)) else: if df is None or int(df) != df or int(df) <= 0: raise ValueError('df must be a positive integer for t distribution') F = _t_cdf(float(stat), int(df)) if alt == 'less': p = F elif alt == 'greater': p = 1.0 - F else: p = 2.0 * (F if F < 0.5 else 1.0 - F) if p < 0.0: p = 0.0 elif p > 1.0: p = 1.0 return p Compute the cumulative distribution function (CDF) of the specified distribution at the test statistic. For the normal distribution, use the error function. For the t-distribution with df degrees of freedom, use the regularized incomplete beta function via a stable continued fraction (Lentz's method). The one-sided p-values are directly CDF or its complement, and the two-sided p-value is 2 * min(CDF, 1 - CDF).

Quick Overview

Explanation

Hints

Quick Overview