S30 Logo
S30 AI Labwww.thes30.com
Back
#3

EDA Lab + Homework

Easy🐍 Python & DataW1 D5

EDA Lab + Homework

Build strong intuition for Exploratory Data Analysis (EDA) as it’s used in real ML work: validating data assumptions, finding leakage, discovering slices, and making high-signal plots and summaries.

Students can quickly diagnose dataset issues (missingness, outliers, imbalance, shift), produce defensible summary tables/plots, and articulate next modeling steps.

Progress — 0/8 tasks

1Tasks

Interview Angles

  • Outliers can be signal (fraud, whales) — don’t blindly delete.

FAANG Gotchas

  • “No missing values” can still be wrong if missing is encoded as `"unknown"`, `-1`, empty string.
  • Simpson’s paradox: slice trends can flip when aggregating.
Python 3 — Notebook
1
Dataset & Setup

EDA Lab + Homework (Student)

Goal: practice high-signal EDA like you would in a FAANG ML interview or on-call investigation.

Rules:

  • Work top-to-bottom
  • Don't hardcode outputs
  • Prefer concise, high-signal plots

1) Sanity Checks — 10 minutes

Loading editor...
2
Data grain + schema
Loading editor...
3
Missingness + duplicates
1
Loading editor...
4
Numeric summaries
Loading editor...
5
Plot 2 high-signal distributions

Example: sessions_last_7d (skew), avg_session_min (heavy tail), tenure_days (range)

Loading editor...
6
Label imbalance

Checkpoint: which metric would you choose (accuracy vs F1 vs PR-AUC) and why?

Loading editor...
7
Slice analysis
Loading editor...
8
Identify leakage-prone features
Loading editor...