Linear Regression (“Best-Fitting Line”)
Studying linear relationships between variables is a popular statistical method. This method is known as linear regression. Linear regression is most commonly used for analyzing observational data with dependent variables (also known as “outcome” or “response” variables, usually noted as y) and independent variables (also known as “explanatory” or “predictor” variables, usually denoted as x). It represents a relating link between the variables, which when graphed on X and Y-coordinates, produces a best fitting line. When a straight line is fitted to a set of data points, we are able to measure the effect of a single independent variable. We can then measure the impact of that variable by analyzing the slope of that line. We can also use this to predict trends in data. We use a scatter plot to determine the relationship between our data. A scatter plot is a graph of plotted points that shows how the sets of data relate to one another. Linear regression shows the best fitting line that represents, or predicts, the value of the dependent variable, given the known value of the independent variable. If the data points in a scatter plot are close to a line, the line is a good fit of data. If it does not, then the line with most of the points closer to it that any other is the one that gives good fit of data. The line drawn is known as the line of means. It shows the mean of all of the values of “Y” corresponding to the known value of “x”. There is also a concept called correlation coefficient, which is a measure of trends in predicted values follow trends in, actual past values. It is a measure of how the predicted values with fit with real-life data. The correlation coefficient is a number between 0 and 1. If the correlation coefficient is 0 or very low there is no relationship between the actual values and the predicted values.