S30 Logo
S30 AI Labwww.thes30.com
Back
#9

Logistic Regression Gradient & Convexity

Medium🎯 Classical MLW3 D1

Logistic Regression Gradient & Convexity

Derive and implement logistic regression *correctly* (loss, gradient, Hessian intuition) and explain why the loss is convex.

Students can implement vectorized logistic regression, gradient-check it, and communicate convexity + optimization reasoning at FAANG interview depth.

Progress — 0/8 tasks

1Tasks
2Sigmoid + Stable Loss
3Gradient (Derive → Implement → Check)
4Convexity & Hessian Intuition
5Optimization (Bonus)

Interview Angles

  • How do you compute `log(1 + exp(z))` stably?
  • Why is Newton fast but expensive?
  • What breaks if data is perfectly separable?

FAANG Gotchas

  • Most bugs are missing `1/n` constants or shape/broadcast mistakes.

Asked At

Google (Deep Learning, Research teams)Amazon (AWS ML, Applied Science)Meta (Computer Vision, Ranking)Microsoft (Azure ML, Research)Netflix (Content classification)Apple (Device ML, Core ML)
Python 3 — Notebook
0/8 solvedSubstack Notes
1
Dataset & Setup

Logistic Regression Gradient & Convexity — Student Lab

Complete all TODOs. This lab is math-first and stability-first.

Section 0 — Synthetic Dataset

We’ll create a binary classification dataset with both separable and non-separable regimes.

Synthetic dataset (first two features)

Loading editor...
Solution
1

Sigmoid + Stable Loss

2
Stable sigmoid
2

Section 1 — Sigmoid + Stable Loss

Task 1.1: Stable sigmoid

Sigmoid function curve

Explain: Why does sigmoid saturate for large |z| and what does that do to gradients?

Explain: Why does sigmoid saturate for large |z| and what does that do to gradients?
Loading editor...
Solution
3
Stable binary cross-entropy loss
1

Task 1.2: Stable binary cross-entropy loss

We use labels y in {0,1}.

Loss per example: -y log(p) - (1-y) log(1-p) where p=sigmoid(z).

Binary cross-entropy loss curves

Interview Angle: explain a stable form of log-loss.

Loading editor...
Solution
2

Gradient (Derive → Implement → Check)

4
Implement loss and gradient
2

Section 2 — Gradient (Derive → Implement → Check)

Task 2.1: Derive the gradient (write in markdown)

Show that for loss averaged over n examples: grad = X^T (p - y) / n

Checkpoint: What is the shape of grad?

Task 2.2: Implement loss and gradient

FAANG gotcha: ensure y is 0/1, not -1/+1.

Loading editor...
Solution
5
Numerical gradient check (finite differences)
1

Task 2.3: Numerical gradient check (finite differences)

Compare your analytic gradient to a central-difference numerical gradient.

Checkpoint: Why can eps too small make the check worse?

Loading editor...
Solution
3

Convexity & Hessian Intuition

6
Hessian-vector product (HVP)
2

Section 3 — Convexity & Hessian Intuition

Task 3.1: Hessian-vector product (HVP)

For logistic regression: H = (1/n) X^T S X where S = diag(p(1-p)).

Implement HVP: compute H@v without building full H explicitly.

  • s = p*(1-p)
  • compute Xv then multiply by s then X^T
Loading editor...
Solution
7
Empirical PSD check
1

Task 3.2: Empirical PSD check

Check that v^T H v >= 0 for random v (PSD).

Explain: Why does PSD imply convexity?

Explain: Why does PSD imply convexity?
Loading editor...
Solution
4

Optimization (Bonus)

8
One step of GD vs Newton (conceptual)
1

Section 4 — Optimization (Bonus)

Task 4.1: One step of GD vs Newton (conceptual)

Implement one gradient descent step. (Newton step is optional.)

FAANG gotcha: perfectly separable data can push weights to infinity; explain why.

Section 4 — GD Step

Gradient descent update rule:

w' = w - η ∇L(w)

Where:

  • w = current weights
  • w' = updated weights after one optimization step
  • η = learning rate (step size)
  • ∇L(w) = gradient of the loss with respect to w

Newton (Hessian) step notation:

w' = w - H(w)^{-1} ∇L(w)

Where:

  • H(w) = ∇²L(w) = Hessian matrix (second derivatives)
  • H(w)^{-1} ∇L(w) rescales the gradient using local curvature

Interpretation: gradient descent uses only slope; Newton step uses slope + curvature for a more curvature-aware update.

Loading editor...
Solution

Need help? Share feedback