The accompanying illustrations for this blog post, you can find in this MS OneNote ("1.b.ii Random Forest").
As the name suggests, the random forest is made up of decision trees. And to create a decision tree, what we do is, we put our training data, which in this case has a binary label, into the decision tree algorithm. And that tree we then use to classify the new, unknown examples from the test data. So, let’s say in this case that we take example 1 and the tree predicts the label to be a zero.
And now, for the random forest algorithm, the idea is simply instead of just having one decision tree, let’s have many different decision trees. So the question then is: How do we actually create different trees from the same training data set? And the answer is that we introduce some randomness into our data.
And the first way in which we can do that is called bootstrapping. So, we create different bootstrapped data sets.
And what we do to create a bootstrapped data set is, that we randomly pick an example from our training data set and we put it into our bootstrapped data set. Then, we so to say, put it back into our training data set and then we again randomly pick an example from that. And we repeat this process many more times depending on how many examples we want to have in the bootstrapped data set.
And because we are always putting back the example, we can have duplicates. For instance in bootstrapped data set 1, example 3 appears two times.
Random Subspace Method
So, that’s the first way in which we can introduce randomness into our data. The other is via the features of the data set. And this is called random subspace method.
And what we do here is that, when we create a node in the tree, instead of considering all the features of the data set for the potential splits, we simply consider a random subspace of those features. So for example, instead of considering all 3 features of our data set, we can say, just randomly select for instance two of those. And you do that every time you create a node.
Random Forest continued
So, that’s how we can introduce randomness into our data and thereby create different decision trees. And then, we use those trees to classify our examples. So for instance, we put example 1 into each of those trees to get their individual predictions. And whatever prediction appears most often, that’s our final prediction for the random forest.