Simple Linear Regression


A simple linear regression model is a model describing a linear relationship between two variables x=(x1, x2,..., xn) and y=(y1, y2,..., yn) as:
 y=a+bx+e,  where a and b are some unknown parameters and e=(e1, e2,..., en) is the error term. 

The goal of simple linear regression is to use the variables x and y to estimates the unknown parameters  a and b, under some conditions on the errors term e.


The assumptions on simple linear regression models are the same as the assumptions on the errors in  Linear Models, namely
1. e1, e2,..., en are random and independent,
2. e1, e2,..., en  all have mean 0,
3. e1, e2,..., en  all have the same variance (homoscedasticity),
4. e1, e2,..., en  are normally distributed.


Residual: The difference between the predicted value (based on the regression equation) and the actual, observed value.

Outlier: In linear regression, an outlier is an observation with large residual. In other words, it is an observation whose dependent-variable value is unusual given its value on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.

Leverage: An observation with an extreme value on a predictor variable is a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. High leverage points can have a great amount of effect on the estimate of regression coefficients.

Influence: An observation is said to be influential if removing the observation substantially changes the estimate of the regression coefficients.  Influence can be thought of as the product of leverage and outlierness.

Cook's distance (or Cook's D): A measure that combines the information of leverage and residual of the observation.


Click on one of the link below to see how to perform a simple linear regression with the chosen of the package.

R SAS Minitab