Generative vs Discriminative Models

WEBVTT

1
00:00:02.780 --> 00:00:04.160
In this guide,

2
00:00:04.160 --> 00:00:06.060
we're gonna be talking about a common way

3
00:00:06.060 --> 00:00:08.750
of categorizing classification algorithms

4
00:00:08.750 --> 00:00:10.040
and that's by dividing them

5
00:00:10.040 --> 00:00:12.943
between generative or discriminative models.

6
00:00:13.990 --> 00:00:15.540
When push comes to shove,

7
00:00:15.540 --> 00:00:16.720
it doesn't really matter

8
00:00:16.720 --> 00:00:19.090
what classifier we're talking about.

9
00:00:19.090 --> 00:00:21.570
Whether it's multi-class, binary,

10
00:00:21.570 --> 00:00:24.030
generative, or discriminative,

11
00:00:24.030 --> 00:00:28.030
they all work towards the same goal of grouping observations

12
00:00:28.030 --> 00:00:30.003
by establishing a decision boundary.

13
00:00:30.940 --> 00:00:34.270
But what really differentiates classification algorithms

14
00:00:34.270 --> 00:00:36.653
are the steps they take to get the result.

15
00:00:37.870 --> 00:00:40.400
Before we get into some of the finer points,

16
00:00:40.400 --> 00:00:43.270
an example of a generative model that we've already covered

17
00:00:43.270 --> 00:00:45.070
is Naive Bayes.

18
00:00:45.070 --> 00:00:47.150
And like Naive Bayes,

19
00:00:47.150 --> 00:00:51.170
generative models are generally fairly simple to implement

20
00:00:51.170 --> 00:00:54.040
and usually quick to run.

21
00:00:54.040 --> 00:00:55.420
Because of their efficiency,

22
00:00:55.420 --> 00:00:57.640
they can also scale really well

23
00:00:57.640 --> 00:01:00.560
when you're working with a large data set.

24
00:01:00.560 --> 00:01:02.640
They also don't need much training data

25
00:01:02.640 --> 00:01:04.433
which gives them another advantage.

26
00:01:05.820 --> 00:01:08.400
Now to get a bit more technical,

27
00:01:08.400 --> 00:01:11.700
generative models work by trying to actually model

28
00:01:11.700 --> 00:01:13.820
how the data was generated

29
00:01:13.820 --> 00:01:16.663
and use that to categorize a new observation.

30
00:01:17.980 --> 00:01:19.260
On the other hand,

31
00:01:19.260 --> 00:01:21.580
discriminative models don't really care

32
00:01:21.580 --> 00:01:23.790
how the data was generated.

33
00:01:23.790 --> 00:01:26.513
It just wants to categorize a new observation.

34
00:01:28.630 --> 00:01:30.600
To help explain this a little bit better,

35
00:01:30.600 --> 00:01:32.800
here's a pretty good analogy.

36
00:01:32.800 --> 00:01:36.660
So I have two nephews, Carter and Jackson.

37
00:01:36.660 --> 00:01:37.970
And on one afternoon,

38
00:01:37.970 --> 00:01:40.220
my sister called to ask if she could drop them off

39
00:01:40.220 --> 00:01:42.500
at my place for a couple of hours.

40
00:01:42.500 --> 00:01:43.550
While they were over,

41
00:01:43.550 --> 00:01:46.810
they spent most of their time playing with Myles, my cat

42
00:01:46.810 --> 00:01:49.070
and Sam, my dog.

43
00:01:49.070 --> 00:01:51.610
Later in the evening, when they're back home

44
00:01:51.610 --> 00:01:54.460
Carter and Jackson are reading a book with my sister

45
00:01:54.460 --> 00:01:57.230
when they come across a picture of a dog.

46
00:01:57.230 --> 00:02:00.890
So my sister asks both Carter and Jackson

47
00:02:00.890 --> 00:02:03.723
if it's a picture of Sammy or Myles.

48
00:02:04.670 --> 00:02:08.540
Carter, the generative classifier, loves to draw.

49
00:02:08.540 --> 00:02:10.450
So he grabs his box of crayons

50
00:02:10.450 --> 00:02:12.480
and based on what he remembers

51
00:02:12.480 --> 00:02:16.130
draws a picture of both Sam and Myles.

52
00:02:16.130 --> 00:02:18.820
He compares his drawing to the picture in the book

53
00:02:18.820 --> 00:02:23.690
and decides the picture is probably a dog just like Sammy.

54
00:02:23.690 --> 00:02:26.240
Jackson on the other hand is only a year old

55
00:02:26.240 --> 00:02:28.000
and can't draw yet.

56
00:02:28.000 --> 00:02:29.670
But he's still able to figure out

57
00:02:29.670 --> 00:02:31.720
the picture in the book is a dog

58
00:02:31.720 --> 00:02:34.093
based strictly on his observations.

59
00:02:35.070 --> 00:02:36.750
So while both of my nephews

60
00:02:36.750 --> 00:02:40.160
were able to successfully determine the type of the animal,

61
00:02:40.160 --> 00:02:41.760
the way they came up with their answer

62
00:02:41.760 --> 00:02:43.713
turned out to be completely different.

63
00:02:44.960 --> 00:02:47.590
Now let's say we're given this training set.

64
00:02:47.590 --> 00:02:49.820
What a discriminative model is gonna do

65
00:02:49.820 --> 00:02:51.860
is try to separate the two classes

66
00:02:51.860 --> 00:02:53.433
by using a straight line.

67
00:02:54.410 --> 00:02:57.040
The first iteration might use a boundary

68
00:02:57.040 --> 00:02:58.590
that looks something like this,

69
00:02:59.500 --> 00:03:01.310
but as the parameters are optimized

70
00:03:01.310 --> 00:03:03.390
and more iterations are run,

71
00:03:03.390 --> 00:03:06.023
it will begin to look more and more like this.

72
00:03:06.940 --> 00:03:10.110
In contrast, rather than looking at both classes

73
00:03:10.110 --> 00:03:12.890
and trying to figure out how to separate them,

74
00:03:12.890 --> 00:03:15.990
a generative model will look at one class

75
00:03:15.990 --> 00:03:17.920
like the cat training set

76
00:03:17.920 --> 00:03:19.490
and then try to build a model

77
00:03:19.490 --> 00:03:21.773
encapsulating all of their features.

78
00:03:22.850 --> 00:03:25.720
Then once it has the first model figured out,

79
00:03:25.720 --> 00:03:27.270
it moves over to the second class

80
00:03:27.270 --> 00:03:30.343
and tries to build a model of what a dog might look like.

81
00:03:32.190 --> 00:03:35.210
So if a new observation comes in

82
00:03:35.210 --> 00:03:38.830
and based on the features falls within this boundary,

83
00:03:38.830 --> 00:03:40.913
it will be classified as a dog.

84
00:03:42.840 --> 00:03:46.880
Internally, what a discriminative model is attempting to do

85
00:03:46.880 --> 00:03:50.793
is learn the probability of Y given X directly,

86
00:03:51.650 --> 00:03:53.720
where Y are the class labels

87
00:03:53.720 --> 00:03:56.113
and X represents all of the features.

88
00:03:57.470 --> 00:03:58.690
On the other hand,

89
00:03:58.690 --> 00:04:01.030
a generative algorithm tries to learn

90
00:04:01.030 --> 00:04:03.930
the probability of X given Y

91
00:04:03.930 --> 00:04:06.300
which makes sense when you think about the analogy

92
00:04:06.300 --> 00:04:08.210
we just used.

93
00:04:08.210 --> 00:04:10.610
Before any observations came in,

94
00:04:10.610 --> 00:04:12.830
Carter already had a cat and dog model

95
00:04:12.830 --> 00:04:14.460
built in his own head.

96
00:04:14.460 --> 00:04:16.520
So when a new observation came in,

97
00:04:16.520 --> 00:04:20.280
he already knew what each class label should look like.

98
00:04:20.280 --> 00:04:23.273
And all he had to do was compare the features.

99
00:04:24.960 --> 00:04:27.730
Now I know this wasn't the most exciting guide,

100
00:04:27.730 --> 00:04:29.750
but it's an aspect of machine learning

101
00:04:29.750 --> 00:04:32.350
that you definitely need to be aware of.

102
00:04:32.350 --> 00:04:35.470
And as we introduce new algorithms throughout the course,

103
00:04:35.470 --> 00:04:37.810
we'll break down how each of them works

104
00:04:37.810 --> 00:04:40.833
to determine if they're generative or discriminative.

105
00:04:42.400 --> 00:04:44.770
But for now, I'll wrap things up

106
00:04:44.770 --> 00:04:46.520
and I'll see you in the next guide.