How do pre-trained models work?
I have written articles on multi-label classification, image segmentation and my first deep learning hackathon. In these articles, I’ve mentioned that I used a pre-trained model (resnet34 in most cases) and that it is generally a good idea to start with a pretrained model than training from scratch.
In this article, I will provide an elaborate explanation for the same, and in the process help you understand most of the code snippets. We will also look at results of one technique that helps improve the performance of our model.
Enjoy the article and help me share it.
The first thing we want to understand is how a neural network works. All of us know that a neural network is a collection of neurons and activation functions. The first set of neurons is called the input layer, the last set is called the output layer and the middle ones are called hidden layers.
Trending AI Articles:
When we train a neural network, the initial layers of a neural network can identify very simple things. Say a straight line or a slant one. Something really basic.
As we go deeper into our network, we can identify more sophisticated things as shown below.
Layer 2 can identify shapes like squares or circles.
Layer 3 can identify intricate patterns.
And finally, the deepest layers can identify things like dog faces. These things can be identified because the weights of our model are set to a certain value.
Resnet34 is one such model. It is trained in to classify a 1000 images. Now think about this. If you want to make a classifier, any classifier, the initial layers are going to detect slant lines no matter what you classify. It is really the final layers that learn to identify sophisticated stuff that need training.
Hence what we do is, we take Resnet34 and add some more layers to it. Let’s take a look at the corresponding code snippets to understand both things together.
Initially, we only train the added layers. We do so because the weights of these layers are initialized to random values. The layers of Resnet34 are freezed and undergo no training.
Once we’ve trained the last layers a little, we can unfreeze the layers of Resnet34. We then find the learning rate and train the whole model.
Our learning rate plot is as follows:
We choose a value for our learning rate just before when the graph starts to rise (1e-04 here). The other option, and the one I have used is to select a slice.
This means that if we had only 3 layers in our network, the first would train at a learning rate = 1e-6, the second at 1e-5 and the last one at 1e-4. In our model, the layers of Resnet34 don’t require much training and can train at a lower learning rate, while the newly added models need to be trained at a slightly higher learning rate. Hence the slice.
This concept of training different parts of a neural network at different learning rates is called discriminative learning, and is a relatively new concept in deep learning.
We continue this process of unfreezing the layers, finding the learning rate and training some more till our training loss is less than our validation loss and we are sure we are not overfitting.
Improving the performance
One trick to improve the performance of a model is to train a model for lower resolution images (size = 128) and use those weights as initial values for higher resolution images. I’ve done the same in this notebook. And the performance of my model increased by a good 2%.
Now an increase from 92% to 94% may not sound like such a big deal but if we are dealing with medical applications we want to be as accurate as possible. And it’s these small tricks that separate the good models from the competition winning models. If you know any such tricks, mention them in the comments section below.
If you liked this article, give it at least 50 claps :p
Don’t forget to give us your 👏 !
How do pretrained models work? was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.