S30 Logo
S30 AI Labwww.thes30.com
Back
#5

Calculus & Gradients

EasyπŸ“ Math for MLW2 D2

Calculus & Gradients

Become fluent with derivatives/gradients used in ML: chain rule, Jacobians, Hessians, and checking gradients numerically.

Students can derive and implement gradients for common losses, and debug gradient bugs using finite differences.

Progress β€” 0/5 tasks

1Tasks
2Finite Differences (Gradient Checking)
3Chain Rule in Code
4Logistic Regression (Core Interview Gradient)
5Jacobian/Hessian Intuition

Interview Angles

  • β€’ How would you debug an autodiff bug in production?
  • β€’ What does the Hessian tell you about optimization and curvature?

FAANG Gotchas

  • β€’ Most gradient bugs are shape bugs (broadcasting) or missing constants.

Asked At

GoogleGitHub
Python 3 β€” Notebook
0/5 solvedSubstack Notes
1
Dataset & Setup

Calculus & Gradients β€” FAANG-Level Lab

Goal: Implement and verify gradients like an ML engineer.

Key idea: If you can’t gradient-check it, you don’t really trust it.

Loading editor...
Solution
1

Finite Differences (Gradient Checking)

2
Implement numerical gradient for scalar f(w)
1

Section 1 β€” Finite Differences (Gradient Checking)

Task 1.1: Implement numerical gradient for scalar f(w)

  • ●Use central difference: (f(w+eps e_i) - f(w-eps e_i)) / (2 eps)

Explain: Why is central difference more accurate than forward difference?

Explain: Why is central difference more accurate than forward difference?
Loading editor...
Solution
2

Chain Rule in Code

3
Gradient of MSE for linear model
1

Section 2 β€” Chain Rule in Code

Task 2.1: Gradient of MSE for linear model

Model: y_hat = Xw Loss: L(w) = (1/n) * sum_i (y_hat_i - y_i)^2

  • ●Let r = Xw - y
  • ●grad = (2/n) * X^T r

FAANG gotcha: shape mismatches; keep w as (d,) and X as (n,d).

Loading editor...
Solution
3

Logistic Regression (Core Interview Gradient)

4
Binary cross-entropy gradient
1

Section 3 β€” Logistic Regression (Core Interview Gradient)

Task 3.1: Binary cross-entropy gradient

Given labels y in {0,1}. p = sigmoid(Xw) Loss = -(1/n) * sum(y log p + (1-y) log(1-p))

  • ●sigmoid(z)=1/(1+exp(-z))
  • ●grad = (1/n) * X^T (p - y)
  • ●add numerical stability for logs (clip p)
Loading editor...
Solution
4

Jacobian/Hessian Intuition

5
Compute Hessian of f(w)=sum(w^2) numerically
2

Section 4 β€” Jacobian/Hessian Intuition

Task 4.1: Compute Hessian of f(w)=sum(w^2) numerically

  • ●Hessian of sum(w^2) is 2I
  • ●Use numerical_grad on each component of grad

This is mainly about shape thinking: Hessian is (d,d).

Loading editor...
Solution

Need help? Share feedback