This post is part of a series:
So, what exactly is machine learning (ML)?
Well, as the name suggests machine learning is about machines that learn. And, I think, by rephrasing it that way, some obvious questions arise: Namely, “How do they learn?” and “What do they learn?” And these are the two questions I’m trying to answer in this post, at least at a very high level.
See slide 1
And because it is much easier to understand abstract concepts like ML with an example, I am going to answer these questions by falling back on the often-used and well-known Iris flower data set. In fact, it is so well-known that it even has its own Wikipedia article.
But instead of just relying on the data set alone to explain how ML works, I am going to give it some context by introducing a made-up scenario. And that way, I hope, one can get some understanding of how ML could be applied to solve a “real-world problem”.
See slide 2
Imagine you are a flower grower and you are specialized in Iris flowers. You grow three different types of flowers. And even though they are different types, they look extremely similar to each other which means that you have to be a real expert to be able to distinguish them.
Let’s say you have a large field with thousands of these flowers and let’s also say that you can’t have separate fields for each type because they don’t grow well in monocultures. So, they have to be mixed-up with each other.
And then, when it’s time for the harvest you bring in some low-level workers and you want them to pick the flowers and you also want them to sort the flowers. But they are obviously no Iris flower experts which means that you have to train them on how to distinguish the different types of flowers before they can start to work.
The problem with this approach, however, is that every season there are new workers, so you have to train them anew every single time. And that is just a very repetitive and inefficient way of doing things. So, you try to devise a simpler approach for them to distinguish the different flowers.
That is the problem you are trying to solve. How can you get the workers to sort the flowers without having to train them? And since you’ve heard about that thing called ML, you randomly pick 50 flowers of each type yourself and then gather some data on these flowers.
And this is the first element to answer the question HOW machines learn, namely they need some data.
See slide 3
Just like humans can learn from experiences, machines can learn from data. But data is a relatively ambiguous term. So, what does it actually look like in the context of ML?
In most cases, you can think of data simply as a table.
See slide 4
The rows of such a table represent individual examples or instances of your particular subject of interest. So, in our case we are interested in Iris flowers, hence each row is one specific flower. And the columns of such a table always contain different characteristics or so-called “features” which basically describe your subject of interest.
So, in our scenario, the data that you gathered contains information about the sepal length and width and petal length and width. The sepals are the bigger outer leaves of the blossom and the petals are the smaller inner leaves of the blossom. And you simply measured their length and width and then recorded those numbers.
And you did that because, as the Iris flower expert, you suspect that these features are the most useful to distinguish the three different types of flowers. And furthermore, as the expert you were also able to determine what species each particular flower was. And you stored that information in the last column which is kind of special column. It’s called a label.
And going back to our initial questions, this is something of WHAT machines are supposed to learn. If you have such a label in your data like the species column, then the specific type of machine learning is called supervised learning.
See slide 5
And this can be further broken down into 2 sub-categories, one of which is classification.
See slide 6
Classification simply means that the label can only take on a discrete number of classes or categories. So, in our scenario, for instance, we only have 3 different types of flowers.
And now, here is what supervised learning tries to do: Suppose you go back to your field and you pick a new flower.
See slide 7
The question that supervised learning tries to answer is: What species is that particular flower? And ideally it should say: “A flower with these specific values, that it is an Iris-versicolor.”
See slide 8
So, more generally speaking, the goal of supervised learning is to predict the label of a specific example solely based on its features.
And the way that it achieves that is with the help of an algorithm. And looking back at our initial questions, this is the second element of HOW machines learn.
See slide 9
And algorithm might sound very technical and complicated, but it’s basically just a certain number of predefined steps that transform an input into an output.
See slide 10
An everyday analogy for example would be a cooking recipe. The ingredients are the input and the final dish is the output. And the steps described in the recipe are the actual algorithm.
So, an algorithm in itself is something pretty trivial. But what makes ML algorithms so powerful, at least in my opinion, is that they take data as input and they output a decision. And what the broader implications of that might be, I will discuss in the last post of this series.
How Machine Learning works
But for now, it’s enough to know that we have all the elements we need to make our machine learn. Namely, we have some data, in this case it is labeled data because we are doing a supervised learning task, and we have an algorithm.
So, how does ML work, at least at a high level?
Well, you simply show the algorithm some examples.
See slide 11
And you say: Okay, a flower with a sepal length of 5.1, a sepal width of 3.5, a petal length of 1.4 and a petal width of 0.2, that’s an Iris-setosa. A flower with a sepal length of 5.7, a sepal width of 2.8, a petal length of 4.5 and a petal width of 1.3, that’s an Iris-versicolor. And you do that for all the examples that you have.
And that way the algorithm learns to detect patterns in the data, and it adjusts its parameters accordingly. And once it is done with adjusting, or in ML terms, once it is trained, it is able to determine what species a specific flower is, given only its sepal length and width and its petal length and width.
Conclusion of the Example Scenario
And this is what you as a flower grower wanted to achieve. Because now, the workers who harvest the flowers don’t have to know anything about Iris flowers.
See slide 12
All they have to do is measure the leaves and then they type in those measurements into the algorithm, for example in the form of an app. And then the algorithm tells them what type of flower they picked. This way they can sort them accordingly without any expertise in Iris flowers whatsoever which means that you don’t have to train them anymore which is the goal you were trying to accomplish in the first place.
And this concludes our scenario and hopefully it demonstrated how ML generally works. And since this post should just provide a high-level overview, in the next post, I am going to talk about how one specific machine learning algorithm works in in detail. And this algorithm is going to be a decision tree.
But before I end this post, I want to go back to our initial questions and I want to fill in the missing terms.
See slide 13
So first, the other type of supervised learning is called regression.
See slide 14
In contrast to classification you are not trying to predict a certain class out of a fixed number of classes, but you are trying to predict an exact number.
See slide 15
Let’s say you have a data set that contains information about houses. The features, among others, for example might be the number of square foot of the living space, the number of square foot of the whole lot or the number of bedrooms. And the label is the actual price of that house.
And here again, you show the algorithm a bunch of examples. So you say, a house like the house in the first row, has price of $221,900 and so on. And then, once the algorithm is trained, it will be able to predict the exact price of a house, solely by looking at its features.
And maybe if you are a real estate agent this might be useful if you want to determine how much you should be willing to pay for a specific house at least based on this analysis.
So, that’s regression. Next, the other type of learning is called unsupervised learning.
See slide 16
And the difference to supervised learning is that, here, you don’t have a label in your data set.
See slide 17
So, in our Iris flower example, there would be no “Species” column and only the sepal and petal length and width columns would be left. And how this difference changes the goal of unsupervised learning compared to the goal of supervised learning is probably easiest explained by looking at the following diagrams.
See slide 18
Here, I created two scatter plots with the petal width on the x-axis and petal length on the y-axis. And, as you may know, each dot in such a diagram represents one specific instance of your data, so one specific flower in this case. And the data depicted in those diagrams, is the same in both cases. So, the dots are all in the exact same places.
The only difference is that in the left graph we have labels. The green dots are Iris-setosa, the orange ones are Iris-versicolor and the blue dots are Iris-virginica. So, the left graph represents supervised learning. And in the right graph we have the unsupervised learning case where there are no labels which is why all the dots have the same color.
And what the “unsupervised learning” algorithm now tries to do is to identify groups of flowers that seem to be similar to each other and that might fall into the same class. And as you can see, there is a clear separation between the left cluster of dots and the right cluster of dots. So, the flowers in these clusters seem to have something in common and therefore they probably can be put into a same category. The algorithm, however, doesn’t know what those categories actually are, it just knows that the dots belong to a same category.
And as we know from the supervised learning graph, the flowers in the left cluster indeed all belong to a same class, namely they are all Iris-setosa flowers. But what you can also see in the left graph is that there is not a clear separation between the versicolor and virginica flowers. So, the “unsupervised learning” algorithm probably will have difficulties to detect that the right cluster contains two different categories and not just one which is what you might think when you only look at the right graph.
So, to summarize, in the case of unsupervised learning you don’t have any labels and, in a sense, the algorithm tries to derive such labels from the data itself.
Now, we can get to the last type of ML and it is called reinforcement learning.
See slide 19
This is somewhat different from supervised and unsupervised learning which is why I depicted it a little bit apart from the other two.
See slide 20
Here, the algorithm interacts with its environment and based on its actions it receives rewards or punishments. And then, with the help of this feedback, it learns the optimal behavior for that environment. So, every time you see an algorithm play a game for example, there is probably some type of reinforcement learning involved.
See slide 21
So, those are the different types of ML that exist and the implication of that is that you will need different kinds of data and algorithms to solve the respective problems.
And this concludes my high-level overview of what ML is. And as already said before, in the next post I am going to talk about decision trees as an example for a specific ML algorithm.