Thanks! Share it with your friends!

Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to simply share some of the videos.
Special thanks to these supporters: http://3b1b.co/nn3-thanks

This one is a bit more symbol-heavy, and that’s actually the point. The goal here is to represent in somewhat more formal terms the intuition for how backpropagation works in part 3 of the series, hopefully providing some connection between that video and other texts/code that you come across later.

For more on backpropagation:
http://neuralnetworksanddeeplearning.com/chap2.html
https://github.com/mnielsen/neural-networks-and-deep-learning
http://colah.github.io/posts/2015-08-Backprop/

Music by Vincent Rubinetti:
https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown

——————
Video timeline
0:00 – Introduction
0:38 – The Chain Rule in networks
3:56 – Computing relevant derivatives
4:45 – What do the derivatives mean?
5:39 – Sensitivity to weights/biases
6:42 – Layers with additional neurons
9:13 – Recap
——————

3blue1brown is a channel about animating math, in all senses of the word animate. And you know the drill with YouTube, if you want to stay posted on new videos, subscribe, and click the bell to receive notifications (if you’re into that): http://3b1b.co/subscribe

If you are new to this channel and want to see more, a good place to start is this playlist: http://3b1b.co/recommended

Various social media stuffs:
Website: https://www.3blue1brown.com
Patreon: https://patreon.com/3blue1brown
Reddit: https://www.reddit.com/r/3Blue1Brown

3Blue1Brown says:

1) In other resources and in implementations, you'd typically see these formulas in some more compact vectorized form, which carries with it the extra mental burden to parse the Hadamard product and to think through why the transpose of the weight matrix is used, but the underlying substance is all the same.

2) Backpropagation is really one instance of a more general technique called "reverse mode differentiation" to compute derivatives of functions represented in some kind of directed graph form.

Noel Gomariz says:

what a beautiful series to get started into the matter, thanks <3

Hariharan Nair says:

You are a god

Anto Vrdoljak says:

It really can`t get any better than this. Awesome! This is truly the peak of learning methodology and didactics!

Luna Peragine says:

You explained this in a way that was unbelievably easy to understand, THANK YOU!!!

Brauggi the bold says:

Ok, so turns out Backpropagation is just a really complicated way of saying "Compute the partial derivative". Why do you need to say anything more than that?

Is chapter 5 coming?

Nicholas Kryger-Nelson says:

I would love if anyone could answer my question, does this mean that if you have a net work with many layers, the partial derivative of cost with respect to a parameter in the first layer (like w^1) would be a super long expression that builds on the chain rule.

Ronnie43 says:

Okay but do you take the derivative of the summation of the cost function? Cause before it was just 2(a(L) – y).

Neil Lyons says:

Great video, thank you.

Ministry of good Ideas says:

could someone tell me where the "automatic differentiation" part is in this back propagation?
Is it just the dissolving of the gradient layer by layer?

Linsu Han says:

It's good to visual the vanishing gradient problem here at 4:30 :
– if the Activation is Sigmoid, the derivative is ranges between [0, 0.25]
– if the Activation is ReLU, the derivative is either 1, or 0
With enough Sigmoid activations stacked on top of each other you can see that the upper limit of gradients approaches (.25)^n -> 0

Vivek Arunachala says:

This is definitely the best video and the best channel with perfect explanation on Neural network and insights and functioning. I would like to request you to make more videos related to neural networks and about GAN, CNN, their functioning and working with your awesome visualization. I would definitely want to watch more videos on Neural networks.

Ahmed Syed says:

thank you , this was amazing intuition into the algorithm

Yonit Lev says:

thank you SO MUCH for publishing this video! The illustrations, explanation and your calm voice made it so easy to understand! I first hear of backprop yesterday and the professor didn't go into the math of the cost function derivative one step back from the output layer and your explanation really helped me get a better sense of understaning what's happening here. I can not thank you enough – bless you!

Егор Абросимов says:

If I ever believe in any god, ideas that you give would be somewhere among the reasons

Ciro García says:

I didn't understand a thing, but I'll come back when I actually have the math knowledge to follow along!

DARPA SUPERHIT 2021 Play Now!Close

#### DARPA SUPERHIT 2021

(StoneBridge Mix)

×