- Read Tutorial
- Watch Guide Video
WEBVTT
1
00:00:03.210 --> 00:00:04.530
In this guide,
2
00:00:04.530 --> 00:00:07.700
we're gonna be covering our first classification technique,
3
00:00:07.700 --> 00:00:09.203
which is Naive Bayes.
4
00:00:10.220 --> 00:00:13.240
When I was first learning about the Bayes classifier,
5
00:00:13.240 --> 00:00:15.880
I was slightly shocked or maybe confused
6
00:00:15.880 --> 00:00:18.150
when I found out that the Bayes classifiers
7
00:00:18.150 --> 00:00:20.123
aren't necessarily Bayesian.
8
00:00:20.970 --> 00:00:23.230
In fact, Bayesian methods
9
00:00:23.230 --> 00:00:27.360
weren't even really named after Thomas Bayes or his theorem.
10
00:00:27.360 --> 00:00:30.780
It was actually a statistician named Ronald Fisher,
11
00:00:30.780 --> 00:00:34.090
who mockingly referred to the inverse probability method
12
00:00:34.090 --> 00:00:36.010
as Bayesian.
13
00:00:36.010 --> 00:00:38.993
And for whatever reason, the name stuck.
14
00:00:41.020 --> 00:00:43.390
Now Naive Bayes is an algorithm
15
00:00:43.390 --> 00:00:46.190
that's based on modeling joint distribution
16
00:00:46.190 --> 00:00:50.000
and belongs to a group of probabilistic classifiers
17
00:00:50.000 --> 00:00:53.810
that use Bayes' theorem along with the Naive assumption
18
00:00:53.810 --> 00:00:57.410
that every feature is independent from each other.
19
00:00:57.410 --> 00:01:00.900
So in order for us to understand how Naive Bayes works,
20
00:01:00.900 --> 00:01:04.003
it's pretty important to understand Bayes's theorem.
21
00:01:05.870 --> 00:01:08.390
If you're not familiar with Bayes's theorem,
22
00:01:08.390 --> 00:01:10.270
it's essentially an application
23
00:01:10.270 --> 00:01:12.940
of the chain rule of probability.
24
00:01:12.940 --> 00:01:16.550
And it's just set up in a way to give us a really easy way
25
00:01:16.550 --> 00:01:19.143
of determining posterior probabilities.
26
00:01:20.260 --> 00:01:23.110
Or in other words, it's a simplified way
27
00:01:23.110 --> 00:01:26.740
to estimate the likelihood of an unknown event occurring
28
00:01:26.740 --> 00:01:30.163
based on the probabilities of other known events.
29
00:01:31.310 --> 00:01:33.700
The concept of posterior probability
30
00:01:33.700 --> 00:01:35.810
can be a little confusing.
31
00:01:35.810 --> 00:01:39.020
But it really clicked for me when I started looking at it,
32
00:01:39.020 --> 00:01:40.663
like it's a game of jeopardy.
33
00:01:41.830 --> 00:01:44.030
For us to use Bayes's theorem.
34
00:01:44.030 --> 00:01:46.380
We need to know all of the probabilities
35
00:01:46.380 --> 00:01:48.600
that make up the answer.
36
00:01:48.600 --> 00:01:50.860
Then that will allow us to work backwards,
37
00:01:50.860 --> 00:01:52.513
to figure out the question.
38
00:01:53.560 --> 00:01:56.010
And mathematically, Bayes's theorem
39
00:01:56.010 --> 00:01:58.503
is represented by the following equation.
40
00:02:01.940 --> 00:02:04.310
To use a really simple example,
41
00:02:04.310 --> 00:02:07.120
let's say a family is having twins,
42
00:02:07.120 --> 00:02:09.233
and the first child out is a girl.
43
00:02:10.500 --> 00:02:13.350
The question is, what is the probability
44
00:02:13.350 --> 00:02:15.540
of the couple having twin girls
45
00:02:15.540 --> 00:02:17.573
given that the first was a girl?
46
00:02:19.730 --> 00:02:21.880
Using Bayes' theorem, we'll say that
47
00:02:21.880 --> 00:02:24.590
the probability of having two girls
48
00:02:24.590 --> 00:02:27.310
given one girl was already born
49
00:02:27.310 --> 00:02:31.470
is equal to the probability of having at least two girls
50
00:02:31.470 --> 00:02:34.530
multiplied by the probability of having one girl,
51
00:02:34.530 --> 00:02:36.093
given there's already two.
52
00:02:37.420 --> 00:02:40.470
And all of that is divided by the probability
53
00:02:40.470 --> 00:02:42.563
of at least one girl being born.
54
00:02:44.180 --> 00:02:46.240
Right away we know that the probability
55
00:02:46.240 --> 00:02:50.300
of the family having at least one girl given two were born,
56
00:02:50.300 --> 00:02:53.283
has to be 100%, or one.
57
00:02:54.580 --> 00:02:58.680
We also know that there's only four baby combinations,
58
00:02:58.680 --> 00:03:03.453
girl girl, girl boy, boy girl, and boy boy.
59
00:03:04.870 --> 00:03:06.540
So looking at that,
60
00:03:06.540 --> 00:03:10.120
we know that the probability of having at least two girls
61
00:03:10.120 --> 00:03:12.380
is one out of four.
62
00:03:12.380 --> 00:03:15.420
And the probability of having at least one girl
63
00:03:15.420 --> 00:03:16.923
is three out of four.
64
00:03:18.080 --> 00:03:21.830
So the overall probability of the family having two girls,
65
00:03:21.830 --> 00:03:24.140
knowing the first was already a girl,
66
00:03:24.140 --> 00:03:27.933
is one out of three, or right around 33%.
67
00:03:29.600 --> 00:03:32.730
Now for us as machine learning developers,
68
00:03:32.730 --> 00:03:35.053
the math isn't the most important part.
69
00:03:35.900 --> 00:03:37.900
To me, it's much more valuable
70
00:03:37.900 --> 00:03:41.050
to have an understanding of the core concept.
71
00:03:41.050 --> 00:03:43.870
Because having the ability to apply your knowledge
72
00:03:43.870 --> 00:03:46.313
will take you way further in the industry.
73
00:03:48.020 --> 00:03:51.070
Now, in terms of what Scikit-learn offers,
74
00:03:51.070 --> 00:03:53.573
there are four different Naive Bayes methods,
75
00:03:55.220 --> 00:03:56.773
Gaussian Naive Bayes,
76
00:03:58.380 --> 00:04:00.223
Multinomial Naive Bayes,
77
00:04:02.830 --> 00:04:04.453
Compliment Naive Bayes,
78
00:04:06.808 --> 00:04:08.313
Bernoulli Naive Bayes,
79
00:04:10.980 --> 00:04:13.163
and Categorical Naive Bayes.
80
00:04:14.940 --> 00:04:15.830
In a couple of guides,
81
00:04:15.830 --> 00:04:18.570
we'll be doing a more in-depth example,
82
00:04:18.570 --> 00:04:19.430
but for now,
83
00:04:19.430 --> 00:04:21.973
let's go through some of the basic functionality.
84
00:04:23.170 --> 00:04:25.050
I've done the importing already.
85
00:04:25.050 --> 00:04:27.590
And the newest tool that we'll be applying,
86
00:04:27.590 --> 00:04:31.453
is the Gaussian Naive Bayes, or NB class.
87
00:04:33.340 --> 00:04:38.340
And what we have for the data is a 12 by four matrix,
88
00:04:38.490 --> 00:04:42.160
where each row is represented by a student,
89
00:04:42.160 --> 00:04:45.720
and column one represent exam one grades,
90
00:04:45.720 --> 00:04:47.700
column two exam two,
91
00:04:47.700 --> 00:04:49.870
column three is exam three,
92
00:04:49.870 --> 00:04:52.773
and column four, our final exam grades.
93
00:04:53.640 --> 00:04:56.960
Then in the final column, instead of a course grade,
94
00:04:56.960 --> 00:04:59.273
there's either a zero or one.
95
00:05:00.350 --> 00:05:04.410
So any student who had a course grade below 70%,
96
00:05:04.410 --> 00:05:06.250
was given a zero,
97
00:05:06.250 --> 00:05:10.093
and any student above a 70 was assigned a one.
98
00:05:11.980 --> 00:05:13.500
In terms of our variables,
99
00:05:13.500 --> 00:05:17.120
the feature variable is made up of all of the exam scores
100
00:05:18.320 --> 00:05:20.120
and the target variable contains
101
00:05:20.120 --> 00:05:22.503
the final column of the grade matrix.
102
00:05:23.420 --> 00:05:27.530
And even though the sample we're working with is pretty tiny
103
00:05:27.530 --> 00:05:29.470
as a formality, I decided to add
104
00:05:29.470 --> 00:05:31.683
the train and test split anyway.
105
00:05:33.500 --> 00:05:35.720
Now you probably already noticed
106
00:05:35.720 --> 00:05:37.550
that the setup for Naive Bayes,
107
00:05:37.550 --> 00:05:40.023
is pretty similar to a regression model.
108
00:05:42.050 --> 00:05:45.270
Instead of a regressor, we have a classifier,
109
00:05:45.270 --> 00:05:48.340
but we still use the fit function and training data
110
00:05:48.340 --> 00:05:49.543
to generate a model.
111
00:05:50.600 --> 00:05:52.330
We can still use the score function
112
00:05:52.330 --> 00:05:54.163
to check the accuracy of the model.
113
00:05:55.960 --> 00:05:58.830
And again, because our sample is so small,
114
00:05:58.830 --> 00:06:00.930
our results will probably be different,
115
00:06:00.930 --> 00:06:03.530
so keep that in mind as you're working through this.
116
00:06:04.640 --> 00:06:05.580
All right.
117
00:06:05.580 --> 00:06:08.550
The prediction function works the same as well,
118
00:06:08.550 --> 00:06:11.350
but this time it'll predict the class and input
119
00:06:11.350 --> 00:06:13.430
most likely belongs to,
120
00:06:13.430 --> 00:06:15.463
instead of some numerical value.
121
00:06:17.090 --> 00:06:20.370
And for me, the student is in the class of students
122
00:06:20.370 --> 00:06:23.563
whose course grade was below 70%.
123
00:06:24.950 --> 00:06:27.983
And finally, we can check the class probability.
124
00:06:31.860 --> 00:06:34.530
So there is a 78% chance,
125
00:06:34.530 --> 00:06:37.010
the student belongs to the class of students
126
00:06:37.010 --> 00:06:40.320
who received a grade lower than 70,
127
00:06:40.320 --> 00:06:43.060
and a 22% chance, they belong to the class
128
00:06:43.060 --> 00:06:45.283
where students scored above a 70.
129
00:06:47.250 --> 00:06:50.260
And like I said, we'll be doing more of this later on,
130
00:06:50.260 --> 00:06:53.513
but those are really the basics of how Naive Bayes works.
131
00:06:55.500 --> 00:06:59.400
Overall, the Naive Bayes classifier can be a great tool
132
00:06:59.400 --> 00:07:02.503
that offers a simple solution for classification modeling.
133
00:07:03.940 --> 00:07:06.730
And a few of the reasons why people like to use it,
134
00:07:06.730 --> 00:07:08.960
is that it doesn't need as much training data
135
00:07:08.960 --> 00:07:10.973
as other classification methods.
136
00:07:11.910 --> 00:07:15.550
It can also handle continuous and discrete data.
137
00:07:15.550 --> 00:07:19.340
It's highly scalable with a large number of features,
138
00:07:19.340 --> 00:07:22.763
and it's even fast enough for real-time predictions.
139
00:07:25.290 --> 00:07:27.330
And so with all that being said,
140
00:07:27.330 --> 00:07:28.720
I will wrap this guide up
141
00:07:28.720 --> 00:07:30.517
and I will see you in the next one.