Linear Regression: The Mathematics you should know!
Note: These article assumes that you have some prior knowledge of Linear Algebra
What is Linear Regression?
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the ‘outcome variable’) and one or more independent variables (often called ‘predictors’, ‘covariates’, or ‘features’). (source: wikipedia)
Linear regression is the simplest of all the regressions in which we just fit a straight line between two features. This line is often called the best fit line or Trend line in finance. Using this — best fit line — we can estimate/predict the further values (we will see this in detail). First of all, you should know what are dependent and independent variables. The best way is to take an example,
y = 2x + 1
Here, y changes if x changes but x has no effect of changes in the value of y.
If x has values- 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 we can determine the values of y by one by one putting the values of x in the equation. For example, if we put value 5 in the equation,
y = 2(5) + 1 = 10 + 1 = 11
Hence, we got y as 11 if x is 5 and we can now simply do it for every value. Therefore, x is the independent variable and y is the dependent variable.
In machine learning, we always denote y as a variable that is to be predicted.
Here we have taken Celsius and Fahrenheit values and our goal is to predict Fahrenheit values by given Celsius values. Therefore, Celsius values are Independent whereas Fahrenheit values are dependent. For, celsius value ‘1’ the Fahrenheit value is ‘34.8’. There is some sort of relationship between these values, but what? and how can we find that relationship? First of all, we have to decide which regression algorithm we should apply, for that we will plot these values,
From the graph, it is pretty clear that there is a linear relationship between the data and that’s why we will use Linear Regression. In many articles, you may have seen that we can do it using scikit learn very easily, just importing LinearRegression fitting our data and it’s done! But in this article, we will first see the mathematics and then we will make our own linear regression algorithm. Let’s get started with Mathematics!
The Formula of Linear regression is,
y = mx + c … Eqn 1.1
where ‘m’ is the SLOPE and ‘c’ is the Y-intercept. Every line has some slope and it is calculated as the ratio of vertical change to the horizontal change and it is often quoted sometimes as “rise over run” (in layman language, How steep the line is it’s slope) and Y-intercept is the point where our line and Y-axis intersects.
But we don’t know these terms! All we know is the X (celsius values) and y (Fahrenheit values). Once, we find these terms we can determine any value of y by the given x value. This is where linear regression will help us to determine the slopes and Y-intercept values.
The formula for finding Y-intercept and slope is,
This formula looks confusing but if you have some prior knowledge of Linear Algebra you definitely can understand it.
Here, X′ represents the transpose of the matrix X raised to -1 means the Inverse of the product of X′X and then we will multiple this with the X′y after doing all this calculation we will get a matrix which will contain the slopes and y-intercept of our data. So, let us code this stuff and let python do calculations for us!
We will fit data as follows,
The output is,
Therefore, we can write Equation 1.1 as,
y = 1.8x + 32
Now, we for Celsius value 100 the Fahrenheit value will be,
y = 1.8*100 + 32
y = 180 + 32
y = 212
Therefore, for Celsius value ‘100’ the Fahrenheit value is ‘212’. You can even check it on Google,
You can even check that our slope and y-intercept is exactly equal to the actual formula of converting the celsius value to Fahrenheit value.