Overview of the XOR Problem

WEBVTT

1
00:00:03.460 --> 00:00:05.410
Since this is gonna be the last guide

2
00:00:05.410 --> 00:00:07.160
on k-nearest neighbors,

3
00:00:07.160 --> 00:00:08.530
I thought it would be a good time

4
00:00:08.530 --> 00:00:11.423
to talk about the exclusive or problem.

5
00:00:12.470 --> 00:00:14.430
Without going too deep into it,

6
00:00:14.430 --> 00:00:18.810
exclusive or is a logical operation that only produces

7
00:00:18.810 --> 00:00:22.574
an output of true when the inputs are different.

8
00:00:22.574 --> 00:00:26.250
It can also be expressed as a Venn diagram,

9
00:00:26.250 --> 00:00:29.600
where we're able to see that if one input is true,

10
00:00:29.600 --> 00:00:30.993
the other is false.

11
00:00:32.379 --> 00:00:35.210
To break it down a little bit further,

12
00:00:35.210 --> 00:00:38.023
we're gonna use a diagram to help explain everything.

13
00:00:39.050 --> 00:00:42.080
And as you can see, the diagram is made up

14
00:00:42.080 --> 00:00:44.780
of three basic parts.

15
00:00:44.780 --> 00:00:49.780
We have inputs A and B, followed by a logic gate,

16
00:00:49.780 --> 00:00:53.853
and then an input stating A is exclusively odd with B.

17
00:00:56.550 --> 00:00:59.500
And A is exclusively odd with B

18
00:00:59.500 --> 00:01:02.080
can be expressed by a truth table

19
00:01:02.080 --> 00:01:05.753
that shows all of the possible input-output combinations.

20
00:01:07.490 --> 00:01:11.240
The only time we get a true output happens when one

21
00:01:11.240 --> 00:01:14.910
and only one of the inputs are true.

22
00:01:14.910 --> 00:01:17.153
Otherwise, the output is false.

23
00:01:18.690 --> 00:01:20.860
As you probably expected,

24
00:01:20.860 --> 00:01:23.410
this relates back to what we're working on as well.

25
00:01:24.430 --> 00:01:28.270
Input A and B are our feature variables,

26
00:01:28.270 --> 00:01:31.650
the logic gate is the classification algorithm,

27
00:01:31.650 --> 00:01:34.153
and the output is the class label.

28
00:01:36.340 --> 00:01:38.110
The real issue we run into

29
00:01:38.110 --> 00:01:40.570
happens when we start looking at it graphically

30
00:01:41.650 --> 00:01:44.470
When we start applying the logic gate rules,

31
00:01:44.470 --> 00:01:48.600
we know that if both feature variables are one or higher,

32
00:01:48.600 --> 00:01:53.433
then the output is false, or it belongs to class zero.

33
00:01:54.860 --> 00:01:59.140
We also know that if both variables are less than one,

34
00:01:59.140 --> 00:02:02.163
the output will also belong to the zero class.

35
00:02:03.260 --> 00:02:06.360
Then if one of the feature variables has a value of one

36
00:02:06.360 --> 00:02:09.970
or higher and the other feature variable has a value

37
00:02:09.970 --> 00:02:12.653
lower than one, the input is true.

38
00:02:14.200 --> 00:02:17.340
It might not be super obvious, but you'll see the issue

39
00:02:17.340 --> 00:02:19.673
when we try to graph a decision boundary.

40
00:02:20.560 --> 00:02:22.560
No matter how you adjust it,

41
00:02:22.560 --> 00:02:24.793
it's impossible to get it to work.

42
00:02:25.970 --> 00:02:29.380
And here lies the exclusive or problem.

43
00:02:29.380 --> 00:02:32.090
The inability of a linear classifier,

44
00:02:32.090 --> 00:02:35.300
but more specifically, an artificial neural network,

45
00:02:35.300 --> 00:02:39.420
to predict the outputs of exclusive or logic gates

46
00:02:39.420 --> 00:02:41.433
given two binary inputs.

47
00:02:42.980 --> 00:02:44.740
And while this proves to be a shortcoming

48
00:02:44.740 --> 00:02:47.540
for linear classifiers, it also highlights

49
00:02:47.540 --> 00:02:50.630
one of the advantages of nonlinear classifiers,

50
00:02:50.630 --> 00:02:52.283
like k-nearest neighbors.

51
00:02:53.210 --> 00:02:56.810
As we've discussed, k-nearest neighbors doesn't rely on

52
00:02:56.810 --> 00:03:00.220
establishing a decision boundary and saying that everything

53
00:03:00.220 --> 00:03:03.110
on this side of the plane belongs to class A

54
00:03:03.110 --> 00:03:05.813
and anything on the other side is class B.

55
00:03:06.820 --> 00:03:09.990
Instead, it uses distance-based relationships

56
00:03:09.990 --> 00:03:13.600
to predict a class, which proves to be very helpful

57
00:03:13.600 --> 00:03:14.763
in this situation.

58
00:03:16.970 --> 00:03:19.190
This example is gonna be really similar

59
00:03:19.190 --> 00:03:20.760
to what we did in the last guide,

60
00:03:20.760 --> 00:03:23.363
so I'm just gonna run through the code with you.

61
00:03:24.410 --> 00:03:27.473
The first thing I did was create a sample dataset,

62
00:03:28.640 --> 00:03:32.280
and I did that by using NumPy's randn method

63
00:03:32.280 --> 00:03:34.373
from the random state class.

64
00:03:35.720 --> 00:03:37.940
And by using the randn method,

65
00:03:37.940 --> 00:03:42.560
we're going to get back a matrix with 400 rows, two columns,

66
00:03:42.560 --> 00:03:45.423
and all of the data is normally distributed.

67
00:03:48.130 --> 00:03:50.150
I also created a Y variable,

68
00:03:50.150 --> 00:03:53.640
and that contains all of the class labels.

69
00:03:53.640 --> 00:03:58.223
For that, I used a new NumPy function called logical_xor.

70
00:04:00.210 --> 00:04:03.700
Looking at the documentation, it says the function computes

71
00:04:03.700 --> 00:04:08.700
the truth value of x1, exclusive or, x2, element-wise.

72
00:04:10.630 --> 00:04:13.283
And I'll explain what that means in just a second.

73
00:04:14.890 --> 00:04:18.230
Now, going back to our code, pretty much like we did

74
00:04:18.230 --> 00:04:21.170
in the last guide, I said the first column

75
00:04:21.170 --> 00:04:24.580
of the feature variable will indicate the X coordinate

76
00:04:24.580 --> 00:04:27.610
and the second column will be the Y coordinate.

77
00:04:27.610 --> 00:04:30.850
And then I said, any value above zero

78
00:04:30.850 --> 00:04:33.590
is part of the true class.

79
00:04:33.590 --> 00:04:35.780
Let me just run this cell really quick

80
00:04:35.780 --> 00:04:38.360
and I'll show you what the Y array looks like

81
00:04:38.360 --> 00:04:39.803
compared to the X array.

82
00:04:46.180 --> 00:04:48.230
So this goes back to the truth table

83
00:04:48.230 --> 00:04:51.230
that we talked about earlier, where a true result

84
00:04:51.230 --> 00:04:55.193
was returned if one and only one of the features was true.

85
00:04:56.850 --> 00:04:59.430
Well, this is pretty much the same thing,

86
00:04:59.430 --> 00:05:02.350
only we're using zero as our threshold.

87
00:05:02.350 --> 00:05:05.230
So if both features are above zero,

88
00:05:05.230 --> 00:05:07.250
then the result is false.

89
00:05:07.250 --> 00:05:11.170
But if one feature is above zero and the other is below,

90
00:05:11.170 --> 00:05:13.790
then the result is going be true.

91
00:05:13.790 --> 00:05:17.023
So that's pretty much what the logical_xor function does.

92
00:05:22.500 --> 00:05:24.613
Okay, moving down the line.

93
00:05:25.790 --> 00:05:28.710
Same as before, I use the min and max function

94
00:05:28.710 --> 00:05:30.963
to create the parameters for the mesh grid.

95
00:05:32.490 --> 00:05:35.850
Then I used those parameters in the actual mesh grid

96
00:05:35.850 --> 00:05:38.453
and used a step of .1 again.

97
00:05:40.130 --> 00:05:43.793
Next, I created a classifier and then fit the model.

98
00:05:44.880 --> 00:05:48.580
Then, for the Z variable, I used the predict function,

99
00:05:48.580 --> 00:05:52.310
and inside the parentheses, I used NumPy's C function

100
00:05:52.310 --> 00:05:56.543
to transpose and concatenate, the xx and yy variables,

101
00:05:57.730 --> 00:06:01.690
and the ravel function to smash xx and yy

102
00:06:01.690 --> 00:06:05.160
into a one-dimensional array so they could actually be used

103
00:06:05.160 --> 00:06:06.543
by the predict function.

104
00:06:08.380 --> 00:06:10.700
Then, after the class predictions are made,

105
00:06:10.700 --> 00:06:13.630
I did a reassignment and reshaped Z

106
00:06:13.630 --> 00:06:15.593
to match the shape of xx.

107
00:06:16.800 --> 00:06:19.560
For the scatter plot, I used the first feature variable

108
00:06:19.560 --> 00:06:24.560
column for the X axis and the second column for the Y axis.

109
00:06:24.790 --> 00:06:27.100
I set the color equal to Y,

110
00:06:27.100 --> 00:06:30.230
which makes it so every point isn't the same color,

111
00:06:30.230 --> 00:06:31.730
and then I used the color map

112
00:06:31.730 --> 00:06:33.913
to change the colors to blue and red.

113
00:06:36.070 --> 00:06:38.250
For the contour plot, it's exactly like we did

114
00:06:38.250 --> 00:06:42.990
in the last guide, where we used xx, yy, and z,

115
00:06:42.990 --> 00:06:45.730
and then used the color map for blue and red,

116
00:06:45.730 --> 00:06:48.403
and we changed the alpha to .5.

117
00:06:50.590 --> 00:06:52.403
And when we run the whole thing,

118
00:06:54.220 --> 00:06:56.240
it looks like k-nearest neighbors

119
00:06:56.240 --> 00:06:59.173
was able to handle the exclusive or problem.

120
00:07:00.420 --> 00:07:02.770
And that really goes back to the idea that,

121
00:07:02.770 --> 00:07:06.130
unlike a linear classifier, which makes location-based

122
00:07:06.130 --> 00:07:09.120
decisions in reference to the hyperplane,

123
00:07:09.120 --> 00:07:12.230
k-nearest neighbors uses localized distances

124
00:07:12.230 --> 00:07:14.173
to make classification decisions.

125
00:07:15.670 --> 00:07:19.430
Anyway, this is a topic that's gonna be brought up again,

126
00:07:19.430 --> 00:07:21.410
but for now, I'm gonna wrap it up,

127
00:07:21.410 --> 00:07:23.160
and I will see you in the next one.