Statistics for ML
Master ML-relevant statistics: estimators, confidence intervals, hypothesis tests, p-values, multiple comparisons, and common gotchas.
Students can compute and interpret confidence intervals and tests, understand failure modes (p-hacking, confounding), and connect stats to ML evaluation.
Progress ā 0/6 tasks
Interview Angles
- ⢠Why does CI have ~95% coverage by design?
- ⢠Slicing dashboards can create accidental p-hacking.
FAANG Gotchas
- ⢠CI is for the mean, not individual outcomes.
- ⢠p-value is not P(H0 true).
Asked At
Statistics for ML ā FAANG-Level Lab
Goal: Confidence intervals, hypothesis testing, and interpretation for ML engineering.
Outcome: You can quantify uncertainty and avoid p-value traps.
Estimators (Mean/Variance)
Section 1 ā Estimators (Mean/Variance)
Task 1.1: Unbiased sample variance
Implement sample mean and unbiased sample variance (ddof=1) without calling np.var(..., ddof=1).
- āmean = sum(x)/n
- āunbiased var = sum((x-mean)^2)/(n-1)
Explain: Why divide by (n-1) instead of n?
Confidence Interval for Mean (Normal approx)
Section 2 ā Confidence Interval for Mean (Normal approx)
Task 2.1: 95% CI for mean
Compute a 95% CI for mean using normal approximation: CI = mean ± z * s/sqrt(n), where zā1.96.
- āUse unbiased sample std
FAANG gotcha: CI is about the mean, not individual outcomes.
Task 2.2: Coverage simulation
Simulate repeated sampling from Normal(mu=0, sigma=1). Estimate how often 95% CI contains true mean.
- āRun many trials
- āCount coverage
Explain: Why isn't coverage exactly 0.95 in finite simulation?
Hypothesis Testing (Two-sample test intuition)
Section 3 ā Hypothesis Testing (Two-sample test intuition)
Task 3.1: Permutation test for A/B (no scipy)
Given samples A and B, test whether mean(B) - mean(A) is significant via permutation.
- āCombine samples
- āShuffle and split
- āCompute diff distribution
- āp-value = fraction of diffs >= observed (two-sided if needed)
FAANG gotcha: p-value is not P(H0 true).
Multiple Comparisons (Gotcha)
Section 4 ā Multiple Comparisons (Gotcha)
Task 4.1: Bonferroni correction
If you run m tests at alpha=0.05, Bonferroni uses alpha/m per test.
Compute adjusted alpha for m=20 and explain why this matters in feature slicing / metric dashboards.