EDA Lab + Homework
Build strong intuition for Exploratory Data Analysis (EDA) as it’s used in real ML work: validating data assumptions, finding leakage, discovering slices, and making high-signal plots and summaries.
Students can quickly diagnose dataset issues (missingness, outliers, imbalance, shift), produce defensible summary tables/plots, and articulate next modeling steps.
Progress — 0/8 tasks
1Tasks
Interview Angles
- • Outliers can be signal (fraud, whales) — don’t blindly delete.
FAANG Gotchas
- • “No missing values” can still be wrong if missing is encoded as `"unknown"`, `-1`, empty string.
- • Simpson’s paradox: slice trends can flip when aggregating.
Asked At
GoogleGitHub
Python 3 — Notebook
0/8 solvedSubstack Notes
1
Dataset & SetupEDA Lab + Homework (Student)
Goal: practice high-signal EDA like you would in a FAANG ML interview or on-call investigation.
Rules:
- ●Work top-to-bottom
- ●Don't hardcode outputs
- ●Prefer concise, high-signal plots
1) Sanity Checks — 10 minutes
Loading editor...
2
Data grain + schema1) Sanity Checks — 10 minutes
Task 1.1: Data grain + schema
Loading editor...
4
Numeric summaries2) Distributions + Outliers — 15 minutes
Task 2.1: Numeric summaries
Loading editor...
5
Plot 2 high-signal distributionsTask 2.2: Plot 2 high-signal distributions
Loading editor...
8
Identify leakage-prone features4) Leakage + Time — 10 minutes
Task 4.1: Identify leakage-prone features
Loading editor...