Calculus & Gradients

Easy📐 Math for MLW2 D2

Calculus & Gradients

Become fluent with derivatives/gradients used in ML: chain rule, Jacobians, Hessians, and checking gradients numerically.

Students can derive and implement gradients for common losses, and debug gradient bugs using finite differences.

Progress — 0/5 tasks

1Tasks

2Finite Differences (Gradient Checking)

3Chain Rule in Code

4Logistic Regression (Core Interview Gradient)

5Jacobian/Hessian Intuition

Interview Angles

• How would you debug an autodiff bug in production?
• What does the Hessian tell you about optimization and curvature?

FAANG Gotchas

• Most gradient bugs are shape bugs (broadcasting) or missing constants.

Asked At

GoogleGitHub

Python 3 — Notebook

0/5 solvedSubstack Notes

Dataset & Setup

Calculus & Gradients — FAANG-Level Lab

Goal: Implement and verify gradients like an ML engineer.

Key idea: If you can’t gradient-check it, you don’t really trust it.

Loading editor...

Solution

Finite Differences (Gradient Checking)

Implement numerical gradient for scalar f(w)

Section 1 — Finite Differences (Gradient Checking)

Task 1.1: Implement numerical gradient for scalar f(w)

●Use central difference: (f(w+eps e_i) - f(w-eps e_i)) / (2 eps)

Explain: Why is central difference more accurate than forward difference?

Explain: Why is central difference more accurate than forward difference?

Loading editor...

Solution

Chain Rule in Code

Gradient of MSE for linear model

Section 2 — Chain Rule in Code

Task 2.1: Gradient of MSE for linear model

Model: y_hat = Xw Loss: L(w) = (1/n) * sum_i (y_hat_i - y_i)^2

●Let r = Xw - y
●grad = (2/n) * X^T r

FAANG gotcha: shape mismatches; keep w as (d,) and X as (n,d).

Loading editor...

Solution

Logistic Regression (Core Interview Gradient)

Binary cross-entropy gradient

Section 3 — Logistic Regression (Core Interview Gradient)

Task 3.1: Binary cross-entropy gradient

Given labels y in {0,1}. p = sigmoid(Xw) Loss = -(1/n) * sum(y log p + (1-y) log(1-p))

●sigmoid(z)=1/(1+exp(-z))
●grad = (1/n) * X^T (p - y)
●add numerical stability for logs (clip p)

Loading editor...

Solution

Jacobian/Hessian Intuition

Compute Hessian of f(w)=sum(w^2) numerically

Section 4 — Jacobian/Hessian Intuition

Task 4.1: Compute Hessian of f(w)=sum(w^2) numerically

●Hessian of sum(w^2) is 2I
●Use numerical_grad on each component of grad

This is mainly about shape thinking: Hessian is (d,d).

Loading editor...

Solution