EDA Lab + Homework
Build strong intuition for Exploratory Data Analysis (EDA) as it’s used in real ML work: validating data assumptions, finding leakage, discovering slices, and making high-signal plots and summaries.
Students can quickly diagnose dataset issues (missingness, outliers, imbalance, shift), produce defensible summary tables/plots, and articulate next modeling steps.
Progress — 0/8 tasks
1Tasks
Interview Angles
- • Outliers can be signal (fraud, whales) — don’t blindly delete.
FAANG Gotchas
- • “No missing values” can still be wrong if missing is encoded as `"unknown"`, `-1`, empty string.
- • Simpson’s paradox: slice trends can flip when aggregating.
Python 3 — Notebook
1
Dataset & SetupEDA Lab + Homework (Student)
Goal: practice high-signal EDA like you would in a FAANG ML interview or on-call investigation.
Rules:
- ●Work top-to-bottom
- ●Don't hardcode outputs
- ●Prefer concise, high-signal plots
1) Sanity Checks — 10 minutes
Loading editor...
2
Data grain + schemaLoading editor...
3
Missingness + duplicates 1
Loading editor...
4
Numeric summariesLoading editor...
5
Plot 2 high-signal distributionsExample: sessions_last_7d (skew), avg_session_min (heavy tail), tenure_days (range)
Loading editor...
6
Label imbalanceCheckpoint: which metric would you choose (accuracy vs F1 vs PR-AUC) and why?
Loading editor...
7
Slice analysisLoading editor...
8
Identify leakage-prone featuresLoading editor...