Basics of Deep Learning p.4  Implementing the Feedforward Algorithm in pure Python cont'd1/12/2020 This post is part of a series:
Here are the corresponding Jupyter Notebooks for this post:
In the previous post, we left off at the point where we wanted to create the “determine_layer_outputs” function. And that’s what we are going to do now. Determine Layer OutputTherefor, let’s look again at this slide: See slide 1 So, what we want to do in the function, is to use the two list of lists at the bottom to create the list of lists at the top. And to better understand how the function is supposed to work, let’s also depict the hidden layer inputs. So, the values that go into the nodes. See slide 2 And now, let’s first see how we determine the input value of the first node. See slide 3 To determine that, we have to calculate the weighted sum of the inputs of the first example and the weights that go to the first node. See slide 4 And then, to get the output of this node, we put this weighted sum into our step function. See slide 5 And since the value of the weighted sum is above 2, we get a 1 as the output for this node. In a similar way, we can the determine the output of the second node. See slide 6 Now, we are done with the first example. And then, we simply repeat those steps for the other examples as well. See slides 710 So basically, we simply loop over our inputs and then over all the weights. And for each respective combination, we simply use our “weighted_sum” function and our “step_function”, that we created in the previous post, to determine the outputs of the layer. See code cell 10 of Jupyter Notebook 1 The “activation_function” parameter is just there to be able to see what the layer inputs look like. This way, we can compare the numbers determined in the code (see code cell 11) with the numbers that we have in our slide (see slide 11). And, as you can see, they are the same. So, the function seems to be working. So now, let’s determine the output of the whole neural net. See code cell 12 of Jupyter Notebook 1 And, as you can see, the neural net predicts the first flower to be an Irissetosa (since there is a 1 in the first column and a 0 in the other two). It predicts the second flower to be an Irisversicolor and the third to be an Irisvirginica. And now, let’s see what those flowers actually are. See code cell 13 of Jupyter Notebook 1 So, as you can see, our neural net predicted all three flowers correctly. And with that, we have basically answered the question of how the neural net makes a decision. See slide 12 So now, we could start to tackle the second question of how we actually determine the right parameters for the algorithm. So, how we determine the right weights so that the neural net is able to make correct predictions. But, before we get to that, I would like to point out a big problem with our code that we have so far. Limitations of codeNamely, if we look at the “determine_layer_outputs” function (code cell 10), we can see that there are two forloops. And actually, the “weighted_sum” function (see code cell 5) contains another forloop. So, in total, there are 3 for loops. And the total number of iterations, that we have to perform in order to determine the output of just one layer, is as follows: See slide 13 And this means, the total number of iterations grows extremely fast. And for an interpreted and dynamically typed language like Python, this is a really big problem in terms of speed. And that’s because for every variable in a forloop, Python has to perform a number of checks. And this produces a little overhead every time you have to do that. So, the higher the number of total iterations, or in other words, the more examples we have or the more nodes we have in a layer or the more features there are in the data set, the more apparent the slowness of Python will become. And to see the impact of that, I have created a chart that shows the average run time of our “determine_layer_outputs” function for different sizes of list of lists. See code cell 5 of Jupyter Notebook 2 On the xaxis, the dimensions of the list of lists are depicted. So, the smallest list of lists, for example, is a list with 10 sublists where each sublist has 10 elements. The largest one has 400 sublists with 400 elements each. On the yaxis, the average run time in seconds is depicted. And, as you can see, it grows exponentially as the list of lists gets bigger and bigger. For the 400x400 list, the function already needed more than 3.5 seconds to run. And a 400x400 list is actually not that big, especially in the context of deep learning where you might work with millions or tens of millions of examples. In that case, the function might probably run hours or even days. And remember, the function only determines the outputs of one layer and not the whole neural net which consists of many layers. And on top of that even, as we will see in later posts, in order to ultimately find the right weights for our neural net, we also need to run the feedforward algorithm many, many times. So, clearly this function is not going to be a practical approach for implementing the feedforward algorithm. So, how do we now solve this speed problem? Dot ProductWell, to answer that questions, let’s have a closer look at what our artificial neuron actually does. And in particular, let’s look at the first part of the neuron. See slide 14 Like we have mentioned many times before, this is just a weighted sum. See slide 15 So, it’s not some unique operation that is only done by our artificial neuron. And it turns out that the way that such a weighted sum is defined, is exactly the same as the definition of the dot product, which is a product of two vectors. See slide 16 So, if those two vectors contain our xvalues and wvalues, then the way we calculate the dot product is exactly the same as the way we calculate the weighted sum. See slide 17 So, the weighted sum and the dot product are equivalent in this case. And the reason why that’s important is because the dot product is a fundamental operation of linear algebra. Linear AlgebraAnd linear algebra, in turn, is a branch of mathematics that is needed in many different areas besides just deep learning, for example in computer graphics, cryptography, ranking websites, social network analysis and many more.
So accordingly, people have created libraries that are highly optimized to perform linear algebra operations as fast as possible and as accurate as possible. See slide 18 One is for example called BLAS which stands for “basic linear algebra subprograms”. And another one that builds on that is called LAPACK which simply stands for “linear algebra package”. So, those optimized algorithms are going to give us some speed improvements. But on top of that, those libraries are written in C and Fortran which are lowlevel languages. See slide 19 And such lowlevel languages are normally much, much faster than highlevel languages like Python. So, this is another source of speed improvement. And a third source comes from something called SIMD which stands for “single instruction, multiple data”. See slide 20 And this is a concept that takes advantage of the architecture of modern CPUs, namely it focuses on processing operations in parallel. So, for example, let’s say you want to create a new list by multiplying the respective elements of two lists. See slide 21 To do that in pure Python, we would need a forloop. In the context of SIMD, however, we can simply multiply “list_1” with “list_2”. And this will lead to the same result. So, with SIMD we give the computer just a single instruction. But since the variables contain multiply data points, the computer can do those multiplications in parallel which obviously speeds up this operation. And expressing the code in this way is something called vectorization. See slide 22 And that’s because we express the forloop logic with vectors (vector in this case simply means “a list of numbers”). So, this is another concept that will give us some speed improvements. So now, we have seen three concepts that could improve the performance of our code. And a Python library that actually takes advantage of all three concepts is called NumPy. See slide 23 And to make us of it, we now have to vectorize our code. And that’s what we are going to do in the next post. 
AuthorJust someone trying to explain his understanding of data science concepts Archives
February 2020
Categories
