This post is part of a series:
Here is the corresponding Jupyter Notebook for this post:
In the previous post, we left off at the point where we wanted to implement the backpropagation algorithm in code.
See slide 1
So, this is what we are going to do in this post. And actually, this is pretty easy now, because all we have to do is to implement those equations in the slide.
So, in the Jupyter Notebook we first load in the Iris flower data set and then we select the 3 flowers with which we have been working the whole time.
See code cell 2 in the Jupyter Notebook
Then, we create our input matrix “x” and label matrix “y” (we also determine N which we are going to need in order to calculate the MSE).
See code cell 3 in the Jupyter Notebook
Then, we define our activation function, namely the sigmoid function.
See code cell 4 in the Jupyter Notebook
After that, we specify the learning rate and the number of nodes in each layer.
See code cell 5 in the Jupyter Notebook
In the cell after that, we randomly initialize our two weight matrices.
See code cell 6 in the Jupyter Notebook
And then, finally we run the feedforward and backpropagation algorithm and execute one gradient descent step.
See slide 2 and code cell 7 in the Jupyter Notebook
After that, we calculate the MSE (the “output_layer_outputs” are still based on our initial, random weights).
See slide 3 and code cell 8 in the Jupyter Notebook
So now, let’s see if the gradient descent step worked. So, if we run the feedforward algorithm again (now with the updated weights), the MSE should be somewhat lower.
See code cells 9-10 in the Jupyter Notebook
And, as you can see, it is indeed somewhat lower.
But, as you can also see, the improvement is only very small. So, we need to run the feedforward and backpropagation algorithms many more times. So, the question now is: How often should we run them?
And one approach might be to say: Let’s just run them until the MSE is zero. But, as we have seen in one of the earlier posts, the gradient descent algorithm gives us just an approximation for the minimum and not an exact value. So, we might never reach an MSE of zero.
So, this begs the question: When should we stop training our neural net? And this will be the topic of the next post.