THE FUTURE IS HERE

Deriving the Ultimate Neural Network Architecture from Scratch #SoME3

#transformers #chatgpt

Join me on a deep dive to understand the most successful neural network ever invented: the transformer. Transformers, originally invented for natural language translation, are now everywhere. They have fast taken over the world of machine learning (and the world more generally) and are now used for almost every application, not the least of which is ChatGPT.

In this video I take a more constructive approach to explaining the transformer: starting from a simple convolutional neural network, I will step through all of the changes that need to be made, along with the motivations for why these changes need to be made.

*By “from scratch” I mean “from a comprehensive mastery of the intricacies of convolutional neural network training dynamics”. Here is a refresher on CNNs: https://www.youtube.com/watch?v=8iIdWHjleIs

Chapters:
00:00 Intro
01:13 CNNs for text
05:28 Pairwise Convolutions
07:54 Self-Attention
13:39 Optimizations