S30 Logo
S30 AI Labwww.thes30.com
Back
#3

EDA Lab + Homework

Easy🐍 Python & DataW1 D4

EDA Lab + Homework

Build strong intuition for Exploratory Data Analysis (EDA) as it’s used in real ML work: validating data assumptions, finding leakage, discovering slices, and making high-signal plots and summaries.

Students can quickly diagnose dataset issues (missingness, outliers, imbalance, shift), produce defensible summary tables/plots, and articulate next modeling steps.

Progress — 0/8 tasks

1Tasks

Interview Angles

  • Outliers can be signal (fraud, whales) — don’t blindly delete.

FAANG Gotchas

  • “No missing values” can still be wrong if missing is encoded as `"unknown"`, `-1`, empty string.
  • Simpson’s paradox: slice trends can flip when aggregating.

Asked At

GoogleGitHub
Python 3 — Notebook
0/8 solvedSubstack Notes
1
Dataset & Setup

EDA Lab + Homework (Student)

Goal: practice high-signal EDA like you would in a FAANG ML interview or on-call investigation.

Rules:

  • Work top-to-bottom
  • Don't hardcode outputs
  • Prefer concise, high-signal plots

1) Sanity Checks — 10 minutes

Loading editor...
Solution
2
Data grain + schema

1) Sanity Checks — 10 minutes

Task 1.1: Data grain + schema

Loading editor...
Solution
3
Missingness + duplicates
1

Task 1.2: Missingness + duplicates

Loading editor...
Solution
4
Numeric summaries

2) Distributions + Outliers — 15 minutes

Task 2.1: Numeric summaries

Loading editor...
Solution
5
Plot 2 high-signal distributions

Task 2.2: Plot 2 high-signal distributions

Loading editor...
Solution
6
Label imbalance

3) Target + Slices — 15 minutes

Task 3.1: Label imbalance

Loading editor...
Solution
7
Slice analysis

Task 3.2: Slice analysis

Loading editor...
Solution
8
Identify leakage-prone features

4) Leakage + Time — 10 minutes

Task 4.1: Identify leakage-prone features

Loading editor...
Solution

Need help? Share feedback