S30 Logo
S30 AI Labwww.thes30.com
Back
#30

Activation Functions & Initialization

Hard🧠 Deep LearningW7 D1

Activation Functions & Initialization

Progress β€” 0/5 tasks

1Tasks
2Activations
3Initialization schemes
4Forward signal propagation across depth
5Backward gradient propagation (toy)
Python 3 β€” Notebook
0/5 solvedSubstack Notes
1
Dataset & Setup

Activation Functions & Initialization β€” Student Lab

Focus: implement activations + init schemes and empirically verify signal/gradient propagation across depth.

Section 0 β€” Setup

We’ll work with synthetic Gaussian inputs so we can isolate activation/init effects.

Loading editor...
Solution
1

Activations

2
Implement ReLU, tanh, and GELU (approx).
1

Section 1 β€” Activations

Task 1.1

Implement ReLU, tanh, and GELU (approx).

Task 1.2

Compare output mean/std on standard normal input.

tanh activation curve

GELU activation curve

Optional concept β€” Softplus

softplus(v) = log(1 + e^v)

Why it is used:

  • ●smooth approximation of ReLU
  • ●avoids numerical instability with a stable implementation
  • ●useful in logistic-loss derivations

Behavior:

  • ●if v is very large positive, log(1 + e^v) β‰ˆ v
  • ●if v is very large negative, log(1 + e^v) β‰ˆ 0

So Softplus behaves like max(0, v) but smoothly.

Loading editor...
Solution
2

Initialization schemes

3
Implement:
1

Section 2 β€” Initialization schemes

We’ll initialize weight matrices for a linear layer: Y = X W (no bias).

Task 2.1

Implement:

  • ●naive normal init with std=1
  • ●Xavier normal init
  • ●He normal init

Return a weight matrix of shape (fan_in, fan_out).

Loading editor...
Solution
3

Forward signal propagation across depth

4
Write `simulate_forward(X0, L, init_fn, act_fn)` returning stats per layer.

Section 3 β€” Forward signal propagation across depth

We simulate an L-layer network: X_{l+1} = act(X_l W_l)

Task 3.1

Write simulate_forward(X0, L, init_fn, act_fn) returning stats per layer.

We care about:

  • ●mean/std of activations
  • ●for ReLU: fraction of zeros
  • ●for tanh: saturation (|a| > 0.95) and average local derivative

Task 3.2

Compare naive vs Xavier/He for depth L=50 using both ReLU and tanh.

Loading editor...
Solution
4

Backward gradient propagation (toy)

5
Implement activation derivatives for ReLU and tanh.
1

Section 4 β€” Backward gradient propagation (toy)

We estimate gradient flow using a simple scalar loss:

  • ●Forward: X_{l+1} = act(X_l W_l)
  • ●Loss: mean(X_L)
  • ●Backward (approx): propagate gradients using local Jacobians

This is not a full autodiff engine; it’s a controlled experiment to see gradient norms explode/vanish.

Task 4.1

Implement activation derivatives for ReLU and tanh.

Task 4.2

Simulate gradient norms across depth for different init schemes.

Loading editor...
Solution

Need help? Share feedback