This post is part of a series:
In the previous post, we saw that the two essential aspects that need to be considered when thinking about any supervised machine learning algorithm were first: How does the algorithm make a decision? And then secondly: How do you determine the right parameters for the algorithm?
See slide 1
We also saw that the first question relies less heavily on math and is more about the conceptual idea behind the algorithm. Whereas, the second question then is somewhat more math-heavy.
And now in this post, we are going to answer the first question with respect to the deep learning algorithm.
See slide 2
The human Brain as an Inspiration
So, how does deep learning algorithm make a decision and what is the conceptual idea behind it?
Well, the idea is pretty straightforward. Namely, it’s that we as humans make decisions all the time and we use our brain to do that. So, the idea simply is then: Okay, let’s try to somehow emulate the brain.
See slide 3
The problem with this idea, however, is that the brain is extremely complex and, still today, we don’t really know how it works in detail. And on top of that, the basics of deep learning were invented in the 1940s. And clearly, back then, they knew even less about the brain.
So, the deep learning algorithm consequently must be an extreme simplification of the brain. So, you should think of deep learning as being inspired by the brain, rather than being an actual representation of the brain.
Keeping that in mind: How does the brain generally work?
Well, the basic anatomical unit of the brain is the neuron. It’s a cell that looks basically like this:
See slide 4
At the front, it has something called dendrites, then there is a soma or cell body and, at the end, there is an axon with axon-terminals. And all it can really do is create an electrical signal. So, it is basically like a switch.
See slide 5
It’s either on and fires, or it is off and doesn’t fire.
And of such neurons, we have billions in our brain. And they are organized in a network. Hence, the term neural network.
See slide 6
And here is how they interact: an individual neuron can receive the electrical signals from all the other neurons that are connected to its dendrites. Then, all those signals are added up in the soma. And if they exceed a certain threshold, then this particular neuron is triggered to create an electrical signal itself. This signal is then sent through the axon and the axon-terminals to all the other neurons that this particular neuron is connected to. And they in turn then, process this signal in the same way.
That’s how neurons generally interact. But how do they now make a decision? It turns out that different areas in the brain, so different sub-networks of neurons, are responsible for executing specific tasks.
So, let’s use the brain of a bird as an example.
See slide 7
Here, let’s say that the neurons at the front receive some sensory input. So, they get activated based on what the bird sees. And the single neuron at the end, let’s say that that is responsible for making the bird peck at whatever it sees. So, if it is firing, the bird is going to peck, otherwise not.
See slide 8
And again, normally it would be a whole sub-network of neurons but let’s just say for now that this one neuron is enough.
So, let’s say the bird sees a specific type of berry.
See side 9
In this case, certain neurons at the front get activated, which in turn activate some of their subsequent neurons. And those then, in turn, activate some of their subsequent neurons. And then finally, the last neuron fires, so that the bird pecks at this type of berry.
So now, let’s say it sees a different type of berry.
See slide 10
Here, other neurons in the front get activated which leads to a different particular combination of neurons being on or off. But in this case also, the last neuron fires and the bird also pecks at this kind of berry.
So, different specific combinations of active and inactive neurons can lead to the same decision. It’s not just that one particular set of active neurons make the bird peck at something.
And now, let’s also look at a counter example.
See slide 11
So, if the bird sees for instance little stones. Then, a particular set of neurons fires which leads to the fact that the last neuron is inactive. And hence, the bird doesn’t peck at the stones.
Learning and Synapses
So, to conclude, the neural net makes decisions, in this case whether to peck at something or not, based on which specific set of neurons is active. But this now begs the question: How did it come about that the bird makes the right decisions in all those different situations (slides 9-11)? So, how did it come about that the right combinations of neurons were activated?
One answer might be, that those decisions are already hard-wired into the brain from the moment the bird is born. But this would, of course, mean that the bird couldn’t adapt to a changing environment. So, for example, maybe it goes to a new territory and there is a new kind of berry that might be a potential additional food supply.
See slide 12
If the bird’s brain isn’t already hard-wired to peck at that berry, then the bird couldn’t take advantage of it. So clearly, building in the capacity for learning into the brain, is highly beneficial for the bird’s survival. Then, the bird could adapt to its environment by learning what the right decisions in certain circumstance are.
See slide 13
So, how is that learning process accomplished?
That’s what synapses are for.
See slide 14
Because the dendrites and axon-terminals aren’t actually directly connected to each other, but instead they are connected via such synapses. And depending on the strength of such a synaptic connection, the electrical signal going from one neuron to the other gets either amplified if the connection is strong or it gets weakened if the connection is weak. And this obviously has an effect on, if the threshold level in the soma of a particular neuron is exceeded or not.
Furthermore, those synapses can also be inhibitory. That means, if a neuron is connected to another neuron via an inhibitory synapse, then by firing, this neuron can reduce the total amount of electrical signal that reaches the succeeding neuron. And thereby, a neuron can actively inhibit another neuron from firing.
So, what the process of learning is about is: changing the strength of those synaptic connections in order to influence what neurons are active or inactive and through that in turn influence what the final decision will be.
And how this learning process exactly occurs in the brain is not really important for us. It’s only important to know that by adjusting the strength of those synaptic connections, we can influence the final decision.
Mathematical Representation of a Neuron
So, at a very high level, this is how neurons and the brain basically work. And, again, this is an extreme over-simplification, but this is, so to say, the essence of it. And this is also what serves as the conceptual idea behind the deep learning algorithm.
See slide 15
So now, let’s start to implement such a neural network in a mathematical way. And since the fundamental unit of the neural net is one neuron, that’s what we are going to model.
See slide 16
To recap, first the neuron receives signals through its dendrites. And for the artificial neuron, those signals are either our input data or if it is located somewhere deeper in the network, then these are simply the signals that it receives from other neurons.
See slide 17
Then, those signals are added up in the soma. So, we also simply add them up.
See slide 18
And now, if the sum of those signals crosses a certain threshold level, then the neuron fires. And we are going to represent this in our artificial neuron by putting the sum into a so-called activation function.
See slide 19
And in this case, the function is a step function. So, if the sum is smaller than a certain value, in this case two, then the function returns a 0. If the sum is bigger or equal to two, then the function returns a 1. And this 1 and 0, is our way of representing if the neuron is firing or not.
And finally, to represent the synapses, which amplify or weaken the incoming signals, we are simply going to multiply each input with an individual weight.
See slide 20
So, if the weight is for example a number between 0 and 1, then the input gets smaller. If it is above 1, then it gets bigger. And if it is a negative number (i.e. representing an inhibitory synapse), then the input gets negative and the neuron is less likely to reach the threshold level because the sum gets reduced somewhat. So basically, the first part of the neuron (up until the sigma), it is just a weighted sum.
So, that’s how we are going to mathematically model the biological neuron. And now, let’s go through some examples to see how it works.
See slide 21
Here, the inputs have the values 2, 0.5 and 1. And the weights are 0.9, 1 and 0.3. Then, we multiply each input with its respective weight and then we add up those values. And this gives us a total of 2.6. Then, we put this 2.6 into our activation function. And since this value is above two, the function returns a 1. So, our artificial neuron outputs a 1, meaning it is firing.
See slide 22
And here is an example where the artificial neuron is not firing.
See slide 23
Artificial Neural Network
So, that’s how our artificial neuron works.
See slide 24
And now, to really indicate that this is one functional unit, we can combine the two circles into one and just assume that the converging lines mean to add up the incoming values.
See slide 25
And by using many of such neurons, we can now create a whole artificial neural network.
See slide 26
And here, just for the sake of clarity, I am going to remove the arrows and the lines indicating our activation function.
See slide 27
This way, the neural net is a little bit less cluttered. And this is also how neural nets are generally depicted. You just have to keep in mind that the nodes contain an activation function and that the information flows from left to right.
Okay, that being said, as you can see in the diagram, neural nets are organized in layers.
See slide 28
The first layer is called the input layer. And often times, you will see that this layer is also represented using circles. But I prefer to visualize it this way, just to make clear that those aren’t actually neurons that receive some input and create some output. They are just some input values. The last layer is called the output layer and every layer between those two are called hidden layers.
And if there is more than one hidden layer in the neural net, like it is the case here, then it is called a deep neural network, hence the term deep learning. And one other thing to notice is that each neuron in one layer is connected to all the neurons in the succeeding layer.
And now, we can use this artificial neural net just like the biological neural net to make decisions.
See slide 29
Therefore, we simply feed our input data into the first layer. And then, for each node, the data is processed the way we saw before with the single artificial neuron. After that, the output of these nodes is used as the input for the next layer. And this way, we keep going through the layers until we reach the end of the neural network and get our, in this case, binary yes-no decision.
And this whole process is called feedforward because we are feeding our input data forward through the network to get our decision.
See slide 30
And with that, we have finally answered the first question of how the neural net makes a decision.
See slide 31
So, let’s now see how this feedforward process can be implemented using code. And this will be the topic of the next post.