Overview of Linear Regression
Guide Tasks
  • Read Tutorial
  • Watch Guide Video
Video locked
This video is viewable to users with a Bottega Bootcamp license

The first form of regression analysis that we're going to talk about is linear regression. This is probably the most well-known type of regression and its purpose is to model the relationship between independent and dependent variables in order to find a trend based on historical data. Or in other words, when we use linear regression, we're fitting the line to a set of data points to help make predictions.

medium

There's going to be some different techniques that we can use to calculate the placement of the regression line, but in scikit-learn, it's determined by using a technique called ordinary least squares or OLS for short. It works by minimizing the square root error or distance between each point and the line and then summing all the squared values. By finding the smallest sum, it minimizes the overall variance and provides us with a line that best represents the data.

And if you're interested in any of the other methods or the math behind ordinary least squares, you are more than welcome to do some of your own research. But thankfully, Python provides the tools to do all the calculations for us.

To finish the guide up, let's just go through some of the more common linear regression examples, like square footage in real estate pricing, time in a job and your overall salary, body mass and heart disease or hours spent studying and exam grades. There's a bunch of linear regression examples in different natural science fields as well. So if you're familiar with Ohm's law, current and voltage will have a linear relationship. When distance and time have a linear relationship, we call that uniform or constant velocity.

medium

medium

There's also some really handy math tricks we can use to transform nonlinear relationships into linear relationships. Let's say we're working with this graph, which is obviously not linear. And these are the given data points. All we have to do is take the natural log of both the input and output and then when we graph it, the result becomes linear.

medium

And I think that about wraps it up for now, so I will see you in the next guide.