# Simple Linear Regression

A simple linear regression model is a model describing a linear relationship between two variables x=(x_{1}, x_{2},..., x_{n}) and y=(y_{1}, y_{2},..., y_{n}) as:** y=a+bx+e**, where a and b are some unknown parameters and e=(e_{1}, e_{2},..., e_{n}) is the error term.

*e*.

_{1}, e

_{2},..., e

_{n}are random and independent,

_{1}, e

_{2},..., e

_{n}all have mean 0,

_{1}, e

_{2},..., e

_{n}all have the same variance (homoscedasticity),

_{1}, e

_{2},..., e

_{n}are normally distributed.

**Residual**: The difference between the predicted value (based on the regression equation) and the actual, observed value.

**Outlier**: In linear regression, an outlier is an observation with large residual. In other words, it is an observation whose dependent-variable value is unusual given its value on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.

**Leverage**: An observation with an extreme value on a predictor variable is a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. High leverage points can have a great amount of effect on the estimate of regression coefficients.

**Influence**: An observation is said to be influential if removing the observation substantially changes the estimate of the regression coefficients. Influence can be thought of as the product of leverage and outlierness.

**Cook's distance** (or Cook's D): A measure that combines the information of leverage and residual of the observation.

Applications

R | SAS | Minitab |