- Read Tutorial
- Watch Guide Video
WEBVTT
1
00:00:03.460 --> 00:00:05.410
Since this is gonna be the last guide
2
00:00:05.410 --> 00:00:07.160
on k-nearest neighbors,
3
00:00:07.160 --> 00:00:08.530
I thought it would be a good time
4
00:00:08.530 --> 00:00:11.423
to talk about the exclusive or problem.
5
00:00:12.470 --> 00:00:14.430
Without going too deep into it,
6
00:00:14.430 --> 00:00:18.810
exclusive or is a logical operation that only produces
7
00:00:18.810 --> 00:00:22.574
an output of true when the inputs are different.
8
00:00:22.574 --> 00:00:26.250
It can also be expressed as a Venn diagram,
9
00:00:26.250 --> 00:00:29.600
where we're able to see that if one input is true,
10
00:00:29.600 --> 00:00:30.993
the other is false.
11
00:00:32.379 --> 00:00:35.210
To break it down a little bit further,
12
00:00:35.210 --> 00:00:38.023
we're gonna use a diagram to help explain everything.
13
00:00:39.050 --> 00:00:42.080
And as you can see, the diagram is made up
14
00:00:42.080 --> 00:00:44.780
of three basic parts.
15
00:00:44.780 --> 00:00:49.780
We have inputs A and B, followed by a logic gate,
16
00:00:49.780 --> 00:00:53.853
and then an input stating A is exclusively odd with B.
17
00:00:56.550 --> 00:00:59.500
And A is exclusively odd with B
18
00:00:59.500 --> 00:01:02.080
can be expressed by a truth table
19
00:01:02.080 --> 00:01:05.753
that shows all of the possible input-output combinations.
20
00:01:07.490 --> 00:01:11.240
The only time we get a true output happens when one
21
00:01:11.240 --> 00:01:14.910
and only one of the inputs are true.
22
00:01:14.910 --> 00:01:17.153
Otherwise, the output is false.
23
00:01:18.690 --> 00:01:20.860
As you probably expected,
24
00:01:20.860 --> 00:01:23.410
this relates back to what we're working on as well.
25
00:01:24.430 --> 00:01:28.270
Input A and B are our feature variables,
26
00:01:28.270 --> 00:01:31.650
the logic gate is the classification algorithm,
27
00:01:31.650 --> 00:01:34.153
and the output is the class label.
28
00:01:36.340 --> 00:01:38.110
The real issue we run into
29
00:01:38.110 --> 00:01:40.570
happens when we start looking at it graphically
30
00:01:41.650 --> 00:01:44.470
When we start applying the logic gate rules,
31
00:01:44.470 --> 00:01:48.600
we know that if both feature variables are one or higher,
32
00:01:48.600 --> 00:01:53.433
then the output is false, or it belongs to class zero.
33
00:01:54.860 --> 00:01:59.140
We also know that if both variables are less than one,
34
00:01:59.140 --> 00:02:02.163
the output will also belong to the zero class.
35
00:02:03.260 --> 00:02:06.360
Then if one of the feature variables has a value of one
36
00:02:06.360 --> 00:02:09.970
or higher and the other feature variable has a value
37
00:02:09.970 --> 00:02:12.653
lower than one, the input is true.
38
00:02:14.200 --> 00:02:17.340
It might not be super obvious, but you'll see the issue
39
00:02:17.340 --> 00:02:19.673
when we try to graph a decision boundary.
40
00:02:20.560 --> 00:02:22.560
No matter how you adjust it,
41
00:02:22.560 --> 00:02:24.793
it's impossible to get it to work.
42
00:02:25.970 --> 00:02:29.380
And here lies the exclusive or problem.
43
00:02:29.380 --> 00:02:32.090
The inability of a linear classifier,
44
00:02:32.090 --> 00:02:35.300
but more specifically, an artificial neural network,
45
00:02:35.300 --> 00:02:39.420
to predict the outputs of exclusive or logic gates
46
00:02:39.420 --> 00:02:41.433
given two binary inputs.
47
00:02:42.980 --> 00:02:44.740
And while this proves to be a shortcoming
48
00:02:44.740 --> 00:02:47.540
for linear classifiers, it also highlights
49
00:02:47.540 --> 00:02:50.630
one of the advantages of nonlinear classifiers,
50
00:02:50.630 --> 00:02:52.283
like k-nearest neighbors.
51
00:02:53.210 --> 00:02:56.810
As we've discussed, k-nearest neighbors doesn't rely on
52
00:02:56.810 --> 00:03:00.220
establishing a decision boundary and saying that everything
53
00:03:00.220 --> 00:03:03.110
on this side of the plane belongs to class A
54
00:03:03.110 --> 00:03:05.813
and anything on the other side is class B.
55
00:03:06.820 --> 00:03:09.990
Instead, it uses distance-based relationships
56
00:03:09.990 --> 00:03:13.600
to predict a class, which proves to be very helpful
57
00:03:13.600 --> 00:03:14.763
in this situation.
58
00:03:16.970 --> 00:03:19.190
This example is gonna be really similar
59
00:03:19.190 --> 00:03:20.760
to what we did in the last guide,
60
00:03:20.760 --> 00:03:23.363
so I'm just gonna run through the code with you.
61
00:03:24.410 --> 00:03:27.473
The first thing I did was create a sample dataset,
62
00:03:28.640 --> 00:03:32.280
and I did that by using NumPy's randn method
63
00:03:32.280 --> 00:03:34.373
from the random state class.
64
00:03:35.720 --> 00:03:37.940
And by using the randn method,
65
00:03:37.940 --> 00:03:42.560
we're going to get back a matrix with 400 rows, two columns,
66
00:03:42.560 --> 00:03:45.423
and all of the data is normally distributed.
67
00:03:48.130 --> 00:03:50.150
I also created a Y variable,
68
00:03:50.150 --> 00:03:53.640
and that contains all of the class labels.
69
00:03:53.640 --> 00:03:58.223
For that, I used a new NumPy function called logical_xor.
70
00:04:00.210 --> 00:04:03.700
Looking at the documentation, it says the function computes
71
00:04:03.700 --> 00:04:08.700
the truth value of x1, exclusive or, x2, element-wise.
72
00:04:10.630 --> 00:04:13.283
And I'll explain what that means in just a second.
73
00:04:14.890 --> 00:04:18.230
Now, going back to our code, pretty much like we did
74
00:04:18.230 --> 00:04:21.170
in the last guide, I said the first column
75
00:04:21.170 --> 00:04:24.580
of the feature variable will indicate the X coordinate
76
00:04:24.580 --> 00:04:27.610
and the second column will be the Y coordinate.
77
00:04:27.610 --> 00:04:30.850
And then I said, any value above zero
78
00:04:30.850 --> 00:04:33.590
is part of the true class.
79
00:04:33.590 --> 00:04:35.780
Let me just run this cell really quick
80
00:04:35.780 --> 00:04:38.360
and I'll show you what the Y array looks like
81
00:04:38.360 --> 00:04:39.803
compared to the X array.
82
00:04:46.180 --> 00:04:48.230
So this goes back to the truth table
83
00:04:48.230 --> 00:04:51.230
that we talked about earlier, where a true result
84
00:04:51.230 --> 00:04:55.193
was returned if one and only one of the features was true.
85
00:04:56.850 --> 00:04:59.430
Well, this is pretty much the same thing,
86
00:04:59.430 --> 00:05:02.350
only we're using zero as our threshold.
87
00:05:02.350 --> 00:05:05.230
So if both features are above zero,
88
00:05:05.230 --> 00:05:07.250
then the result is false.
89
00:05:07.250 --> 00:05:11.170
But if one feature is above zero and the other is below,
90
00:05:11.170 --> 00:05:13.790
then the result is going be true.
91
00:05:13.790 --> 00:05:17.023
So that's pretty much what the logical_xor function does.
92
00:05:22.500 --> 00:05:24.613
Okay, moving down the line.
93
00:05:25.790 --> 00:05:28.710
Same as before, I use the min and max function
94
00:05:28.710 --> 00:05:30.963
to create the parameters for the mesh grid.
95
00:05:32.490 --> 00:05:35.850
Then I used those parameters in the actual mesh grid
96
00:05:35.850 --> 00:05:38.453
and used a step of .1 again.
97
00:05:40.130 --> 00:05:43.793
Next, I created a classifier and then fit the model.
98
00:05:44.880 --> 00:05:48.580
Then, for the Z variable, I used the predict function,
99
00:05:48.580 --> 00:05:52.310
and inside the parentheses, I used NumPy's C function
100
00:05:52.310 --> 00:05:56.543
to transpose and concatenate, the xx and yy variables,
101
00:05:57.730 --> 00:06:01.690
and the ravel function to smash xx and yy
102
00:06:01.690 --> 00:06:05.160
into a one-dimensional array so they could actually be used
103
00:06:05.160 --> 00:06:06.543
by the predict function.
104
00:06:08.380 --> 00:06:10.700
Then, after the class predictions are made,
105
00:06:10.700 --> 00:06:13.630
I did a reassignment and reshaped Z
106
00:06:13.630 --> 00:06:15.593
to match the shape of xx.
107
00:06:16.800 --> 00:06:19.560
For the scatter plot, I used the first feature variable
108
00:06:19.560 --> 00:06:24.560
column for the X axis and the second column for the Y axis.
109
00:06:24.790 --> 00:06:27.100
I set the color equal to Y,
110
00:06:27.100 --> 00:06:30.230
which makes it so every point isn't the same color,
111
00:06:30.230 --> 00:06:31.730
and then I used the color map
112
00:06:31.730 --> 00:06:33.913
to change the colors to blue and red.
113
00:06:36.070 --> 00:06:38.250
For the contour plot, it's exactly like we did
114
00:06:38.250 --> 00:06:42.990
in the last guide, where we used xx, yy, and z,
115
00:06:42.990 --> 00:06:45.730
and then used the color map for blue and red,
116
00:06:45.730 --> 00:06:48.403
and we changed the alpha to .5.
117
00:06:50.590 --> 00:06:52.403
And when we run the whole thing,
118
00:06:54.220 --> 00:06:56.240
it looks like k-nearest neighbors
119
00:06:56.240 --> 00:06:59.173
was able to handle the exclusive or problem.
120
00:07:00.420 --> 00:07:02.770
And that really goes back to the idea that,
121
00:07:02.770 --> 00:07:06.130
unlike a linear classifier, which makes location-based
122
00:07:06.130 --> 00:07:09.120
decisions in reference to the hyperplane,
123
00:07:09.120 --> 00:07:12.230
k-nearest neighbors uses localized distances
124
00:07:12.230 --> 00:07:14.173
to make classification decisions.
125
00:07:15.670 --> 00:07:19.430
Anyway, this is a topic that's gonna be brought up again,
126
00:07:19.430 --> 00:07:21.410
but for now, I'm gonna wrap it up,
127
00:07:21.410 --> 00:07:23.160
and I will see you in the next one.