This post is part of a series:
In the previous post, we left off with the observation that to speed up our code, we have to vectorize it using a Python library called NumPy. And that’s what we are going to do in this post.
Output of one Neuron with NumPy
So first, in order to make use of NumPy, we need to import it at the beginning of our code.
See code cell 1 in Jupyter Notebook 1
And now, let’s start with vectorizing the code for determining the output of one neuron.
See code cells 8-9 in Jupyter Notebook 1
First, we need to transform the Python lists “inputs” and “weights” into NumPy arrays so that they actually represent vectors (and therefore we can then take advantage of the NumPy functionality). And then, we simply need to replace our “weighted_sum” function (see cell 7) with the NumPy function that calculates the dot product, namely “np.dot”.
And that’s because, as said in the previous post, the weighted sum and the dot product are equivalent
And, as you can see, this will give us the same results that we got before where we implemented the artificial neuron in pure Python (code cells 3-7). This time, however, we didn’t need a for-loop to accomplish that.
So, let’s now vectorize the code for determining the output of the whole neural net. Therefore, let’s recall what the “determine_layer_outputs” (code cell 12) function does, given what we now know about the dot product.
See slides 1-6
Namely, we simply loop over all the examples and then for each example we loop over all the weights. And then, for each iteration we want to calculate the dot product of the respective vectors. And the result of that dot product we then put into our step function.
So basically, what the function does, for the case where we don’t use the activation function (so for the case where we just determine the inputs of the layer), is to calculate all the possible dot products of the “list_of_inputs” and the “list_of_weights”.
And as it turns out, there is actually a linear algebra operation that does exactly the same thing. And it’s called a matrix multiplication. It is, however, slightly different defined as our “determine_layer_outputs” function.
See slide 7
Namely, instead of using the row vector in the second list of lists (or in linear algebra terms: in the second matrix), it uses the column vector to calculate the dot product. So, in order to take advantage of the matrix multiplication, we have to transform our weight matrix in such a way that the rows become the columns.
And I think the reason the matrix multiplication is defined this way is because that way the, so to say, “intersection” of those two vectors, now indicates where the result of the dot product is stored in the resulting matrix.
See slide 8
So, for example, if you calculate the dot product of the first row of the input matrix and the second column of the weight matrix, then the result of that dot product is also in the first row and second column of the resulting matrix.
Let’s look at another example:
See slide 9
If you calculate the dot product of the third row and the first column, then the result is in the third row and first column. However, if you do the calculations like we did with our “determine_layer_outpus” function, then you don’t get such a nice representation.
But, this way of defining the matrix multiplication, leads to a certain condition that needs to be fulfilled if you want to multiply two matrices. Namely, the number of columns in the first matrix has to match the number of rows in the second matrix. Otherwise, you can’t calculate the dot product because the number of elements doesn’t match up.
See slide 10
Here, you would need to calculate “5.8*0.9 + 2.7*0.8 + 5.1*(-1) + 1.9*(-1) + 6.2*?”. So, the “6.2” wouldn’t have a corresponding element in the column vector of the second matrix. And therefore, you can’t calculate the dot product of those two vectors. And, accordingly, you can’t do the matrix multiplication.
And that’s why the number of columns in the first matrix has to match the number of rows in the second matrix. And an easy way to remember this condition is to write down the dimensions of the matrices.
See slide 11
So, the input matrix is a 3x4 matrix. It has 3 rows and 4 columns. And the weight matrix is a 4x2 matrix. It has 4 rows and 2 columns. And now, to check whether you can do a matrix multiplication with those two matrices or not, you simply look at the inner two numbers.
See slide 12
If they match up, then you can do the matrix multiplication. And the nice thing now is that the outer two numbers represent the dimensions of the resulting matrix which has 3 rows and 2 columns.
See slide 13
And now, let’s look at the case again where the input matrix had 5 columns.
See slide 14
Here, the inner numbers don’t match up which means you can’t do a matrix multiplication with those two matrices.
From-to Representation of the Weight Matrix
So, that’s how the matrix multiplication works. And now, to make use of it in our code, we have to rewrite our weight matrices in such a way that the rows will become the columns.
See code cell 16 in Jupyter Notebook 1 (compared to cell 11)
And in my opinion, this is the better way of representing the weight matrices. To explain why, let’s look again at this slide:
See slide 15
And now, let’s rewrite the weight matrix so that the rows become the columns.
See slide 16
By doing that, we can now interpret the values in the matrix as the weights going from a certain node in the previous layer to a certain node in the following layer.
So, for example, let’s look at the weight that goes from node 3 in the input layer to node 1 in the hidden layer.
See slide 17
In the weight matrix, the value for this weight is in row 3 and column 1. So, it is the weight that goes “from” node 3 in the previous layer “to” node 1 in the following layer.
Here is another example weight:
See slide 18
And this “from-to” representation is what I meant in a previous post when I said that the underlying structure of the weight matrices is also easy to remember.
And that’s because all you have to do, is to look at the neural net and check the number of nodes from which the weights are coming and check the number of nodes to which the weights are going. So, in this case, the weights go from 4 nodes to 2 nodes. Hence, the weight matrix is a 4x2 matrix.
Okay, so now we can finally replace the “determine_layer_output” function.
See code cell 12 in Jupyter Notebook 1
And since this function basically represents a matrix multiplication (namely for the case where we don’t use the activation function), we can simply replace it with the NumPy function that executes a matrix multiplication.
And actually, there are two such functions. One is called “np.matmul”. And the other is, somewhat surprisingly, also “np.dot”. And for 1D and 2D arrays, both functions have the same behavior. And since in most tutorials “np.dot” is used, let’s also use that.
See code cell 18 in Jupyter Notebook 1
So, we simply replace our “determine_layer_outputs” function with the NumPy function “np.dot”. And, as you can see, for the hidden layer inputs, we get the same result as before (see code cell 13).
And with that, we have vectorized our function and we got rid of the three for-loops. So now, let’s see how fast the NumPy implementation is compared to our pure Python implementation. After all, that’s the reason why we vectorized our code.
See code cell 6 in Jupyter Notebook 2
And as you can see, NumPy is much, much faster than the pure Python code. It even looks like NumPy doesn’t get slower as the matrices get bigger. But that’s only because the difference is so big. If you only depict the NumPy line, then you can see that it also increases as the matrices get bigger.
See code cell 7 in Jupyter Notebook 2
But for the biggest matrix it only needs slightly above 1ms. Whereas, the pure Python implementation needed more than 3.5 seconds (see code cell 6 again).
And this gap in terms of speed (which only keeps increasing by the way) is, in my opinion, the main reason why linear algebra is used in deep learning. At least I haven’t seen any tutorial yet that said something like: “Okay, what the dot product does is, it tells us is how similar the two vectors are in terms of the direction they are pointing in. And this is what it means in the context of deep learning.”
Output of the Neural Network with NumPy
So now, let’s use NumPy to determine the output for the whole neural net.
See code cells 16-19 in Jupyter Notebook 1
Here, we need to adjust our “step_functions” (code cell 17). And that’s because in the previous post, we defined it in such a way that it deals with a single number. But now, we can take advantage of NumPy and simply element-wise compare the whole array to our threshold (so each element in the array respectively gets compared to the threshold in just one step). And since that comparison gives us an array of Booleans, we then transform it into integers. That way, we again have our ones and zeros.
And then, similar as before (see code cell 13-14), we can determine the hidden layer inputs and outputs as well as the output layer inputs and outputs. Only this time, obviously, we don’t use our “determine_layer_outputs” function but instead we simply use “np.dot”. And, as you can see, we get the same results as before.
So, this is now finally how we can implement the feedforward algorithm in code. And I know, I did three entire posts for basically just these four lines of code, but I simply wanted to make sure, that you can understand every little detail of what’s going on in these lines.
And to really make sure that that’s the case, let’s visualize them.
See slide 19
Here is our neural net (again depicted upright) as well as the code for the feedforward algorithm. So, what we then do in the first line of code is to multiply the input matrix (X) with weight matrix 1 (W1).
Side note: Standard notation for matrices is to use capital letters.
See slide 20
And the result of this matrix multiplication is the hidden layer input matrix (Hin).
See slide 21
And by doing this calculation, we, so to say, move in the neural net from the input values to the inputs of the nodes in the hidden layer.
Then, in the next line of code, we put our Hin into the step function to determine the hidden layer outputs (Hout).
See slide 22
And by doing that we, so to say, move through the nodes in the neural net (from the inputs of the hidden layer to the outputs of the hidden layer).
Then, we basically repeat those steps. So, in the next line of code we multiply Hout with weight matrix 2 (W2) to move from the hidden layer outputs to the output layer inputs.
See slide 23
And then, finally, we move again through the nodes by putting Oin into our step function.
See slide 24
And with that, we have determined the output layer outputs (Oout). And those are the decisions of our neural net. So, we have finally, fully answered the first questions of how the deep learning algorithm makes a decision.
See slide 25
Non-Commutativity of Matrix Multiplications
And now, before I end this post, I want to make you aware of another important aspect that you need to consider when implementing the feedforward algorithm. Namely, that matrix multiplications are non-commutative.
And this means that if you have a matrix A and a matrix B, then multiplying matrix A with matrix B is not the same as multiplying matrix B with matrix A.
See code cells 20-24 in Jupyter Notebook 1
And that’s because you are using completely different values to calculate all the individual dot products of those matrix multiplications.
So, the order of the matrices that you are multiplying is really important. And what this means when implementing the feedforward algorithm is, that you have to be careful in what order you put the inputs and weights into the “np.dot” function.
I mean, in our case for example, it wouldn’t hurt you that much if you accidentally switched up the order because the function would simply throw an error.
See code cell 25 in Jupyter Notebook 1
And that’s because we want to multiply a 4x2 matrix with a 3x4 matrix. So, the condition that the “inner values” have to match up is not fulfilled.
Side note: You could get around this by using transposes of the matrices, but I wouldn’t suggest doing that because this would make things unnecessarily complicated.
See code cell 26 in Jupyter Notebook 1
So, for this particular scenario, the non-commutative property of the matrix multiplication is actually not that problematic. But let’s say we would have two symmetric matrices.
See slide 26
Here, we have two 2x2 matrices which we could multiply in any order. So, we wouldn’t run into an error in our code that might help us in identifying that we have done something wrong.
If we (correctly) multiply the input matrix with the weight matrix, then we would get these hidden layer inputs and outputs:
See slide 27
However, if we (incorrectly) multiply the weight matrix with the input matrix, then we would get these hidden layer inputs and outputs.
See slide 28
So, as expected we get completely different results.
And also, if you look at what we actually multiply and add up to get the values in slide 28, then you see that those values actually don’t make any sense. So, for instance, let’s look at the value in row 1 and column 1 of the hidden layer inputs matrix.
See slide 29
In order to calculate that, we would calculate the dot product of the first row of the weight matrix and the first column of the input matrix.
See slide 30
And looking at the neural net, we can see that this calculation doesn’t make any sense at all. Namely, we would multiply x1 of the first example with the weight that goes from node 1 to node 1. And to that we would add the multiplication of x1 of the second example with the weight that goes from node 1 to node 2. So, we are multiplying and adding things together (from the neural net) that don’t make any sense. And the neural net, consequently, would never be able to make correct decisions.
So, I think, the best way to not fall into this trap, is to first make sure that your data and weights are represented in the appropriate way in the matrices.
See slide 31
And what I mean by that is that the data should be stored in such a way that the columns represent the features and the rows represent the different examples or instances. This condition is fulfilled most of the time because that’s how the data is typically stored in available data sets. The more important thing is to make sure that the weights are stored in the “from-to” format.
If those two things are fulfilled, then you can simply look at the diagram of the neural net and, based on that, implement the feedforward algorithm. And that’s because, here too, the inputs come first and then the weights. So, you know that you have to multiply the inputs with the weights (and not the other way around).
So, just as a word of caution, when looking at code examples, always make sure you understand how the data and weights are represented in the matrices.
And with that, we have now fully covered the first question of how the neural net makes a decision.
See slide 32
So now, we can start to tackle the second question. Namely, how do we determine the weights of the neural net so that it is actually able to make correct predictions. And this will be the topic of the next post.