Overview of Perceptron Concepts

WEBVTT

1
00:00:03.960 --> 00:00:04.980
In this guide,

2
00:00:04.980 --> 00:00:07.550
we're gonna be talking about one of the oldest algorithms

3
00:00:07.550 --> 00:00:11.360
used in machine learning, which is the "Perceptron."

4
00:00:11.360 --> 00:00:12.960
As a standalone algorithm,

5
00:00:12.960 --> 00:00:15.890
the perceptron is considered to be a discriminative model

6
00:00:15.890 --> 00:00:18.970
that we can use for binary classification.

7
00:00:18.970 --> 00:00:22.590
And is similar to linear regression and logistic regression

8
00:00:22.590 --> 00:00:25.083
because it's also a type of linear classifier.

9
00:00:26.450 --> 00:00:28.200
So unlike k-nearest neighbor

10
00:00:28.200 --> 00:00:30.120
that makes classification decisions

11
00:00:30.120 --> 00:00:32.330
based on localized distance,

12
00:00:32.330 --> 00:00:34.260
a perceptron will make its decision

13
00:00:34.260 --> 00:00:36.580
based on the value of linear combinations

14
00:00:36.580 --> 00:00:37.943
of the feature values.

15
00:00:39.330 --> 00:00:42.810
All of that's pretty normal for a classification algorithm.

16
00:00:42.810 --> 00:00:45.070
But what makes the perceptron unique

17
00:00:45.070 --> 00:00:46.550
is that it's also considered

18
00:00:46.550 --> 00:00:49.243
to be the simplest form of a neural network.

19
00:00:50.720 --> 00:00:53.550
I'm gonna do my best to keep the history lesson short,

20
00:00:53.550 --> 00:00:57.410
but the perceptron dates all the way back to 1958.

21
00:00:57.410 --> 00:01:00.500
And was originally a machine that was intended to be used

22
00:01:00.500 --> 00:01:03.313
by the United States Navy for image recognition.

23
00:01:04.180 --> 00:01:05.230
At its inception,

24
00:01:05.230 --> 00:01:08.880
there were also incredibly high hopes for the perceptron.

25
00:01:08.880 --> 00:01:09.713
In fact,

26
00:01:09.713 --> 00:01:11.870
the original creator went as far as saying

27
00:01:11.870 --> 00:01:14.457
the perceptron was, and I quote,

28
00:01:14.457 --> 00:01:16.907
"The embryo of an electronic computer

29
00:01:16.907 --> 00:01:19.437
"that the Navy expects will be able to walk,

30
00:01:19.437 --> 00:01:23.637
"talk, see, write, reproduce itself,

31
00:01:23.637 --> 00:01:26.137
"and be conscious of its own existence."

32
00:01:27.260 --> 00:01:30.320
And whether that's fortunate or unfortunate,

33
00:01:30.320 --> 00:01:33.180
it was soon discovered that perceptron couldn't be trained

34
00:01:33.180 --> 00:01:35.800
to recognize different types of patterns.

35
00:01:35.800 --> 00:01:38.883
But specifically, it struggled with the XOR function.

36
00:01:39.920 --> 00:01:42.170
And it was really due to those limitations

37
00:01:42.170 --> 00:01:44.480
that the perceptron was essentially relegated

38
00:01:44.480 --> 00:01:47.093
to be nothing more than a linear classifier.

39
00:01:47.950 --> 00:01:49.280
But over time,

40
00:01:49.280 --> 00:01:51.070
eventually the perceptron expanded

41
00:01:51.070 --> 00:01:53.860
beyond a single layer to multi layers.

42
00:01:53.860 --> 00:01:54.970
And at that point,

43
00:01:54.970 --> 00:01:58.393
it was finally able to overcome a lot of those shortcomings.

44
00:01:59.520 --> 00:02:00.353
For the time being,

45
00:02:00.353 --> 00:02:01.850
we're not gonna go into detail

46
00:02:01.850 --> 00:02:04.160
about the multi layer perceptron.

47
00:02:04.160 --> 00:02:05.890
But the last bit of information

48
00:02:05.890 --> 00:02:07.530
I would like to cover in this guide

49
00:02:07.530 --> 00:02:09.610
is the evolution of the perceptron

50
00:02:09.610 --> 00:02:11.833
and how those changes look over time.

51
00:02:13.070 --> 00:02:14.620
So to start from the beginning,

52
00:02:14.620 --> 00:02:16.240
the foundation of the perceptron

53
00:02:16.240 --> 00:02:19.410
was originally designed to generate Boolean expressions

54
00:02:19.410 --> 00:02:21.653
through the use of logical expressions.

55
00:02:22.600 --> 00:02:25.360
And overall, the model was incredibly simple,

56
00:02:25.360 --> 00:02:28.210
consisting of just two binary on off inputs

57
00:02:28.210 --> 00:02:30.003
and one binary output.

58
00:02:31.060 --> 00:02:32.550
The general idea was that

59
00:02:32.550 --> 00:02:34.290
depending on the number of connections

60
00:02:34.290 --> 00:02:35.950
coming from each input,

61
00:02:35.950 --> 00:02:38.100
different logical expressions could be used

62
00:02:38.100 --> 00:02:39.173
to yield a result.

63
00:02:40.380 --> 00:02:43.500
If we work under the condition that the activation threshold

64
00:02:43.500 --> 00:02:45.730
or decision boundary for neuron C

65
00:02:45.730 --> 00:02:48.300
is greater than or equal to two,

66
00:02:48.300 --> 00:02:51.063
we can assume a variety of cases to be true.

67
00:02:52.370 --> 00:02:57.370
In example one, neuron A and B, both have one connection.

68
00:02:58.380 --> 00:03:01.400
So neuron C will reach its threshold

69
00:03:01.400 --> 00:03:05.243
if and only if neuron A and B fire.

70
00:03:07.030 --> 00:03:08.990
When we go to the second example,

71
00:03:08.990 --> 00:03:12.820
neurons A and B both have two connections.

72
00:03:12.820 --> 00:03:14.520
That means neuron C,

73
00:03:14.520 --> 00:03:17.730
will produce a signal if neuron A or B,

74
00:03:17.730 --> 00:03:19.293
or both are activated.

75
00:03:20.600 --> 00:03:22.650
Then the third example is a little different,

76
00:03:22.650 --> 00:03:26.340
because it's based on something called inhibitory control,

77
00:03:26.340 --> 00:03:28.120
which you can kind of think of

78
00:03:28.120 --> 00:03:30.253
as just being a negative signal.

79
00:03:31.840 --> 00:03:32.980
So in this case,

80
00:03:32.980 --> 00:03:37.350
neuron C is activated only if neuron A is active,

81
00:03:37.350 --> 00:03:39.533
and if neuron B is inactive.

82
00:03:40.960 --> 00:03:43.070
Now to build on that we have something called

83
00:03:43.070 --> 00:03:45.883
the linear threshold unit, or LTU.

84
00:03:46.778 --> 00:03:49.840
The LTU is almost identical to what we just went through,

85
00:03:49.840 --> 00:03:53.080
but with a few important modifications.

86
00:03:53.080 --> 00:03:54.560
The biggest difference is that

87
00:03:54.560 --> 00:03:57.370
instead of using a binary on off switch,

88
00:03:57.370 --> 00:03:59.793
the LTU's input connections are weighted.

89
00:04:00.660 --> 00:04:02.600
So when you take a look at the diagram,

90
00:04:02.600 --> 00:04:04.930
you can see that we have three input connections

91
00:04:04.930 --> 00:04:08.400
named X1, X2 and X3.

92
00:04:08.400 --> 00:04:11.940
And each input is assigned an arbitrary weight,

93
00:04:11.940 --> 00:04:14.770
W1, W2 and W3,

94
00:04:14.770 --> 00:04:17.183
which can either be positive or negative.

95
00:04:18.070 --> 00:04:21.470
Then all three weighted inputs connect to the LTU,

96
00:04:21.470 --> 00:04:22.913
which provides an output.

97
00:04:24.130 --> 00:04:25.520
Just to do a quick example,

98
00:04:25.520 --> 00:04:29.870
let's say the LTU receives an input from X1 and X2,

99
00:04:29.870 --> 00:04:31.940
but not X3.

100
00:04:31.940 --> 00:04:34.200
The first thing to happen is the LTU

101
00:04:34.200 --> 00:04:35.950
will calculate the weighted sum

102
00:04:35.950 --> 00:04:38.090
of the two incoming products,

103
00:04:38.090 --> 00:04:39.160
which in this case,

104
00:04:39.160 --> 00:04:43.040
we're just taking the weighted sum of X1 times W1,

105
00:04:43.040 --> 00:04:45.253
and X2 times W2.

106
00:04:46.110 --> 00:04:47.900
And depending how far you went in math

107
00:04:47.900 --> 00:04:49.290
or how good your memory is,

108
00:04:49.290 --> 00:04:51.300
you can also think of the weighted sum

109
00:04:51.300 --> 00:04:53.530
in terms of a dot product as well,

110
00:04:53.530 --> 00:04:56.620
because the weighted sum is going to equal the dot product

111
00:04:56.620 --> 00:04:59.620
between the feature vector and weight vector

112
00:04:59.620 --> 00:05:03.350
which is also happens to be the same as the matrix product,

113
00:05:03.350 --> 00:05:04.650
where the feature vector

114
00:05:04.650 --> 00:05:07.673
is multiplied by the transposed weight vector.

115
00:05:09.300 --> 00:05:11.120
Now moving on to the final step,

116
00:05:11.120 --> 00:05:13.590
the LTU determines what the output should be

117
00:05:13.590 --> 00:05:15.190
by applying a step function

118
00:05:15.190 --> 00:05:17.610
called the Heaviside step function,

119
00:05:17.610 --> 00:05:19.370
which really just gives us a way

120
00:05:19.370 --> 00:05:21.553
of solving a discontinuous function.

121
00:05:22.860 --> 00:05:24.080
Basically how it works

122
00:05:24.080 --> 00:05:26.940
is that the function will produce a value of zero

123
00:05:26.940 --> 00:05:29.880
until it hits some predetermined threshold,

124
00:05:29.880 --> 00:05:33.233
and then anything after that produces a value of one.

125
00:05:34.200 --> 00:05:37.560
Now when we carry over the functionality of the LTU,

126
00:05:37.560 --> 00:05:39.933
it finally leads us into the perceptron.

127
00:05:41.070 --> 00:05:41.903
Structurally,

128
00:05:41.903 --> 00:05:43.710
the perceptron is almost identical

129
00:05:43.710 --> 00:05:46.580
to the LTU model that we just went through.

130
00:05:46.580 --> 00:05:49.810
And the only significant modification we need to talk about

131
00:05:49.810 --> 00:05:51.263
is the bias neuron.

132
00:05:52.350 --> 00:05:55.920
At first the bias neuron might seem a little pointless,

133
00:05:55.920 --> 00:05:57.970
but the primary reason for having it

134
00:05:57.970 --> 00:06:01.223
is to generate a constant output value of one.

135
00:06:02.490 --> 00:06:03.890
The general idea is that

136
00:06:03.890 --> 00:06:06.450
if there's a constant baseline stimulus,

137
00:06:06.450 --> 00:06:09.130
it can be used to manipulate the activation function

138
00:06:09.130 --> 00:06:12.900
just enough to help generate the required output value,

139
00:06:12.900 --> 00:06:14.740
ensuring all of the information

140
00:06:14.740 --> 00:06:17.663
continues feeding forward through the entire network.

141
00:06:18.780 --> 00:06:21.340
Then once we're through that first input layer,

142
00:06:21.340 --> 00:06:25.190
the perceptron works exactly the same way as the LTU.

143
00:06:25.190 --> 00:06:27.230
It calculates the weighted sum,

144
00:06:27.230 --> 00:06:29.793
and then applies the Heaviside step function.

145
00:06:31.700 --> 00:06:33.970
We're not gonna spend too much time on the code

146
00:06:33.970 --> 00:06:36.180
because there aren't really any major differences

147
00:06:36.180 --> 00:06:38.090
we need to address.

148
00:06:38.090 --> 00:06:39.390
Instead, what I think we should do,

149
00:06:39.390 --> 00:06:41.310
is spend just a little bit of time

150
00:06:41.310 --> 00:06:44.380
talking about a few of the more important parameters,

151
00:06:44.380 --> 00:06:47.610
but also circle back to the XOR problem.

152
00:06:47.610 --> 00:06:50.730
So we can start planting some proverbial seeds,

153
00:06:50.730 --> 00:06:52.840
which will hopefully make the explanation

154
00:06:52.840 --> 00:06:56.220
a little easier to understand when we get to neural networks

155
00:06:56.220 --> 00:06:57.590
and eventually discuss

156
00:06:57.590 --> 00:07:00.253
how they're able to handle a nonlinear function.

157
00:07:02.450 --> 00:07:04.440
So the first chunk of code is obviously

158
00:07:04.440 --> 00:07:06.940
for all the importing that we're gonna need to do.

159
00:07:08.660 --> 00:07:09.493
Then below it,

160
00:07:09.493 --> 00:07:11.530
I used the make classification function

161
00:07:11.530 --> 00:07:14.900
from scikit-learn to make a binary class database

162
00:07:14.900 --> 00:07:17.710
that we're gonna be using for the example.

163
00:07:17.710 --> 00:07:19.450
The implementation of the perceptron

164
00:07:19.450 --> 00:07:23.280
is pretty much identical to most all the other algorithms.

165
00:07:23.280 --> 00:07:24.700
And for the first run through,

166
00:07:24.700 --> 00:07:26.560
we're just gonna use the default settings

167
00:07:26.560 --> 00:07:27.883
for every parameter.

168
00:07:29.200 --> 00:07:30.033
Then after that,

169
00:07:30.033 --> 00:07:32.670
we're just making a mesh grid and getting everything set up

170
00:07:32.670 --> 00:07:34.393
to make an easy to read visual.

171
00:07:36.300 --> 00:07:38.150
Now, I'm gonna go ahead and run this.

172
00:07:41.910 --> 00:07:45.040
And for the most part, the model did a pretty good job,

173
00:07:45.040 --> 00:07:48.810
giving us an accuracy score of 91%.

174
00:07:48.810 --> 00:07:50.890
This part might be a little difficult to see,

175
00:07:50.890 --> 00:07:53.800
but all the observations that are a little more transparent

176
00:07:53.800 --> 00:07:55.720
represent the training data,

177
00:07:55.720 --> 00:07:58.703
and the darker observations are from the testing data.

178
00:07:59.890 --> 00:08:02.610
And not that it matters all that much in this example,

179
00:08:02.610 --> 00:08:04.980
but it kinda looks like the red class

180
00:08:04.980 --> 00:08:06.530
was incorrectly classified

181
00:08:06.530 --> 00:08:08.573
a little bit more than the blue class.

182
00:08:09.630 --> 00:08:12.600
So if this was a model that we were actually working on,

183
00:08:12.600 --> 00:08:14.560
right now would definitely be a good time

184
00:08:14.560 --> 00:08:16.420
to implement a confusion matrix,

185
00:08:16.420 --> 00:08:18.700
and then possibly do some more digging,

186
00:08:18.700 --> 00:08:20.900
just to see what's happening.

187
00:08:20.900 --> 00:08:23.210
But since we're not gonna be doing that in this guide,

188
00:08:23.210 --> 00:08:26.120
let's keep moving along and talk about a couple parameters

189
00:08:26.120 --> 00:08:28.170
that we can use to adjust the perceptron.

190
00:08:29.800 --> 00:08:32.540
Originally, there were a few others I considered covering,

191
00:08:32.540 --> 00:08:35.670
but the more I thought about it, the less it made sense.

192
00:08:35.670 --> 00:08:37.660
So I ended up just going with the two

193
00:08:37.660 --> 00:08:40.400
that I thought you should absolutely be aware of.

194
00:08:40.400 --> 00:08:43.203
And those are max iter and AdaZero.

195
00:08:45.360 --> 00:08:47.350
Max iter is pretty easy to explain,

196
00:08:47.350 --> 00:08:48.820
because it's just the parameter

197
00:08:48.820 --> 00:08:50.920
that allows us to set the number of iterations

198
00:08:50.920 --> 00:08:52.183
over the training data.

199
00:08:53.350 --> 00:08:55.410
AdaZero is a little more confusing,

200
00:08:55.410 --> 00:08:57.440
because it's one of the smaller components

201
00:08:57.440 --> 00:08:59.953
that make up the training rule for the perceptron.

202
00:09:01.220 --> 00:09:03.370
We talked about weights a little bit already,

203
00:09:03.370 --> 00:09:04.840
but it's during the training

204
00:09:04.840 --> 00:09:07.520
when those weights are actually determined.

205
00:09:07.520 --> 00:09:10.910
So to give you a really simplified overview of how it works,

206
00:09:10.910 --> 00:09:13.700
we can use an equation where an updated weight

207
00:09:13.700 --> 00:09:17.780
is equal to the previous weight plus the change in weight.

208
00:09:17.780 --> 00:09:20.260
And then to figure out what the change in weight is equal to

209
00:09:20.260 --> 00:09:21.620
there's a second equation

210
00:09:21.620 --> 00:09:25.080
that states delta W is equal to the target value

211
00:09:25.080 --> 00:09:26.890
minus predicted value,

212
00:09:26.890 --> 00:09:30.250
multiplied by the feature variable and learning rate,

213
00:09:30.250 --> 00:09:31.240
and for right now,

214
00:09:31.240 --> 00:09:33.680
the learning rate is really the only thing we care about,

215
00:09:33.680 --> 00:09:35.210
because that's what we're adjusting

216
00:09:35.210 --> 00:09:37.183
when we use the AdaZero parameter.

217
00:09:39.700 --> 00:09:41.550
So to get back to the code,

218
00:09:41.550 --> 00:09:44.820
let's start off in the console and pass in classifier

219
00:09:45.670 --> 00:09:50.203
followed by dot and underscore, iter underscore,

220
00:09:52.400 --> 00:09:55.340
and what we get back are the actual number of iterations

221
00:09:55.340 --> 00:09:57.513
needed to reach the stopping criteria.

222
00:09:59.890 --> 00:10:01.833
So moving back to the perceptron,

223
00:10:04.660 --> 00:10:07.470
we're gonna start by passing in max iter,

224
00:10:07.470 --> 00:10:09.630
and since the number of iterations it takes

225
00:10:09.630 --> 00:10:11.940
to get to the stopping point is 11,

226
00:10:11.940 --> 00:10:13.690
let's go ahead and start with that.

227
00:10:14.750 --> 00:10:16.800
Now we're gonna go ahead and run it again

228
00:10:17.800 --> 00:10:21.060
and we get back the exact same accuracy score.

229
00:10:21.060 --> 00:10:22.710
But we also get a warning,

230
00:10:22.710 --> 00:10:25.560
letting us know that the algorithm didn't fully converge

231
00:10:25.560 --> 00:10:26.923
on a decision boundary.

232
00:10:28.320 --> 00:10:31.563
So if we take it down to 10, and run it again,

233
00:10:32.880 --> 00:10:35.130
we see a massive change in the boundary

234
00:10:35.130 --> 00:10:37.490
and the accuracy score falls all the way down

235
00:10:37.490 --> 00:10:38.773
to 0.87.

236
00:10:39.950 --> 00:10:40.783
Ultimately,

237
00:10:40.783 --> 00:10:41.760
your goal should be aimed

238
00:10:41.760 --> 00:10:43.720
at reducing the number of iterations,

239
00:10:43.720 --> 00:10:47.010
just enough to save you time and processing power,

240
00:10:47.010 --> 00:10:49.010
but not so low that you're running the risk

241
00:10:49.010 --> 00:10:51.283
of the algorithm not reaching convergence.

242
00:10:52.700 --> 00:10:54.360
Now, I'm gonna go ahead and set the number

243
00:10:54.360 --> 00:10:56.220
of iterations to 100.

244
00:10:56.220 --> 00:10:57.290
And then run it again,

245
00:10:57.290 --> 00:10:59.290
just to get us back to where we started.

246
00:11:01.350 --> 00:11:03.380
I'm pretty sure I forgot to mention this.

247
00:11:03.380 --> 00:11:06.180
But the default value for the AdaZero parameter

248
00:11:06.180 --> 00:11:07.940
is actually one,

249
00:11:07.940 --> 00:11:09.600
which if you think about it,

250
00:11:09.600 --> 00:11:12.280
essentially makes the learning rate a non factor,

251
00:11:12.280 --> 00:11:16.020
since any number multiplied by one is itself.

252
00:11:16.020 --> 00:11:17.390
So to make a change to that,

253
00:11:17.390 --> 00:11:19.723
let's go ahead and pass in AdaZero,

254
00:11:21.020 --> 00:11:23.133
followed by some really big number.

255
00:11:25.700 --> 00:11:27.020
After we run it again,

256
00:11:27.020 --> 00:11:30.320
it looks like everything stayed exactly the same.

257
00:11:30.320 --> 00:11:32.220
But now if we go in the opposite direction,

258
00:11:32.220 --> 00:11:33.370
and change this to 0.1,

259
00:11:35.810 --> 00:11:38.140
we end up seeing a very tiny improvement

260
00:11:38.140 --> 00:11:39.583
in the accuracy score.

261
00:11:41.010 --> 00:11:42.990
And I think we're pretty much done with that.

262
00:11:42.990 --> 00:11:45.910
So I'm gonna go ahead and switch over to the other file,

263
00:11:45.910 --> 00:11:49.450
and we're gonna do a quick run through of why the perceptron

264
00:11:49.450 --> 00:11:51.713
is unable to handle the XOR problem.

265
00:11:57.860 --> 00:11:58.850
So right away,

266
00:11:58.850 --> 00:12:02.020
it's pretty easy to see that if we split the data in half,

267
00:12:02.020 --> 00:12:05.070
all the data on the right is incorrectly classified,

268
00:12:05.070 --> 00:12:07.470
and while it might be marginally better,

269
00:12:07.470 --> 00:12:10.523
the model is essentially no better than a 50-50 guess.

270
00:12:11.830 --> 00:12:14.730
And the reason behind that relates back to a couple concepts

271
00:12:14.730 --> 00:12:16.240
we already talked about,

272
00:12:16.240 --> 00:12:19.033
which have to do with the logic table and convergence.

273
00:12:20.640 --> 00:12:22.040
I'm gonna make this the last topic

274
00:12:22.040 --> 00:12:23.620
we talk about in the guide,

275
00:12:23.620 --> 00:12:24.870
and as a heads up,

276
00:12:24.870 --> 00:12:27.990
I'm really only giving you an abbreviated version.

277
00:12:27.990 --> 00:12:29.940
Because as simple as these concepts are,

278
00:12:29.940 --> 00:12:31.740
it gets surprisingly complicated

279
00:12:31.740 --> 00:12:34.160
once we start applying different theories,

280
00:12:34.160 --> 00:12:35.670
and then having to use proofs

281
00:12:35.670 --> 00:12:37.953
to show why all those theories are true.

282
00:12:39.260 --> 00:12:42.410
So instead I think the most sensible way to approach it

283
00:12:42.410 --> 00:12:44.830
is by addressing the fact that the perceptron

284
00:12:44.830 --> 00:12:47.790
is really just another linear classifier.

285
00:12:47.790 --> 00:12:49.680
And because of that, like all the others,

286
00:12:49.680 --> 00:12:51.270
it tries to separate classes

287
00:12:51.270 --> 00:12:53.683
by using the single most effective boundary.

288
00:12:55.390 --> 00:12:57.400
Now, this is the point where it would start

289
00:12:57.400 --> 00:12:58.900
to get a little more complicated

290
00:12:58.900 --> 00:13:02.370
if we actually had to prove the assertion to be true.

291
00:13:02.370 --> 00:13:04.710
But in order to calculate a boundary,

292
00:13:04.710 --> 00:13:08.030
the series being used has to be convergent.

293
00:13:08.030 --> 00:13:10.440
In turn, proving the existence of a limit,

294
00:13:10.440 --> 00:13:12.613
which can then be used as the boundary.

295
00:13:13.890 --> 00:13:16.070
We're gonna fast forward just a little bit,

296
00:13:16.070 --> 00:13:18.070
and use the graph of the logic table

297
00:13:18.070 --> 00:13:20.680
to apply all the information that we just went through

298
00:13:20.680 --> 00:13:23.260
to try and make better sense of this.

299
00:13:23.260 --> 00:13:25.520
So we know that no matter what we do,

300
00:13:25.520 --> 00:13:26.940
we're gonna end up with a boundary

301
00:13:26.940 --> 00:13:29.963
that incorrectly classifies about half the observations.

302
00:13:30.920 --> 00:13:32.660
And due to the fact that convergence

303
00:13:32.660 --> 00:13:33.960
and the existence of a limit

304
00:13:33.960 --> 00:13:36.160
are required to establish a boundary,

305
00:13:36.160 --> 00:13:38.250
it really cuts down on the number of options

306
00:13:38.250 --> 00:13:39.800
we have to work with.

307
00:13:39.800 --> 00:13:41.120
But fortunately for us

308
00:13:41.120 --> 00:13:43.670
and all of the neural networks around the world,

309
00:13:43.670 --> 00:13:45.790
a workaround does exist,

310
00:13:45.790 --> 00:13:47.650
which is accomplished by creating

311
00:13:47.650 --> 00:13:50.500
or layering on a second boundary.

312
00:13:50.500 --> 00:13:52.040
So once it's added on,

313
00:13:52.040 --> 00:13:53.950
everything below the first boundary

314
00:13:53.950 --> 00:13:57.530
and above the second boundary belong to the blue class,

315
00:13:57.530 --> 00:14:00.343
which leaves everything in between to the red class.

316
00:14:01.420 --> 00:14:02.810
When we compare our new graph

317
00:14:02.810 --> 00:14:04.300
to the k-nearest neighbor model

318
00:14:04.300 --> 00:14:06.320
that we built in the XOR guide,

319
00:14:06.320 --> 00:14:09.110
the two are starting to look more and more alike,

320
00:14:09.110 --> 00:14:10.430
with the obvious exception

321
00:14:10.430 --> 00:14:12.600
of what's going on near the origin,

322
00:14:12.600 --> 00:14:14.120
which is a whole other topic

323
00:14:14.120 --> 00:14:16.420
that we're gonna be saving for a future guide.

324
00:14:17.520 --> 00:14:18.353
But for now,

325
00:14:18.353 --> 00:14:20.800
I think this is gonna be a good stopping point.

326
00:14:20.800 --> 00:14:23.883
So I'll wrap things up and I will see you in the next guide.