Introduction to Classification
Guide Tasks
  • Read Tutorial
  • Watch Guide Video
Video locked
This video is viewable to users with a Bottega Bootcamp license

In this section we're going to be taking an in-depth look into classification and how it's used in machine learning. This is an exciting topic for me because I kinda view classification as the launching pad machine learning used to really make a name for itself. I'm not 100% sure on what year it was, but I think it was sometime around the 1990s when the United States Postal Service started classifying hand written numbers using binary images for automatic zip code reading. I don't really want to go as far as saying it was the very first, but it was definitely one of the first large scale machine learning tasks that we use for automation.

And like regression, classification is another form of supervised machine learning so it still relies on labeled training data to make predictions. But unlike a regression model that looks for relationships in the data, classification models use probabilities to help identify the class or subgroup a target most likely belongs to. And generally speaking, classification algorithms pretty much all work the same way. Basically they'll create a hyperplane that will act as an imaginary boundary that divides the data points into groups. Some models work best by splitting the data into two classes for binary classification while others can handle a bunch of classes for multiclass classification. And since each algorithm is a little bit different, we'll discuss how they determine boundaries along with whether they work best with binary or multiclass classification.

I've already given you one example of how classification has been used, but other than text recognition, classification can be used in the medical field for a diagnosis based on symptoms or to classify different types of cancer based on gene expression. Oftentimes you'll find it in fraud detection based on account activity. And then the most common or at least the most well-known is documentation classification for stuff like email filtering.

And that'll bring us to the end of this guide so I will see you in the next one.