# Multiple Linear Regression

_{1}, y

_{2},..., y

_{n}) and p independent variables x

_{1}, x

_{2},..., x

_{p}, where x

_{i}=(x

_{i1}, x

_{i2},..., x

_{in}), i=1,...,p for p>1 as y=a+b

_{1}x

_{1}+b

_{2}x

_{2}+....+b

_{p}x

_{p}+e, where e=(e

_{1}, e

_{2},..., e

_{n}) is the error vector term, and b

_{1}, b

_{2},...,b

_{p}are unknown parameters to be estimated. The terminology for the regression diagnostics mirrors that of simple linear regression with just few exceptions.

_{1}, e

_{2},..., e

_{n}are random and independent,

_{1}, e

_{2},..., e

_{n}all have mean 0,

_{1}, e

_{2},..., e

_{n}all have the same variance (homoscedasticity),

_{1}, e

_{2},..., e

_{n}are normally distributed.

**Residual**: The difference between the predicted value (based on the regression equation) and the actual, observed value.

**Outlier**: In linear regression, an outlier is an observation with large residual. In other words, it is an observation whose dependent-variable value is unusual given its value on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.

**Leverage**: An observation with an extreme value on a predictor variable is a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. High leverage points can have a great amount of effect on the estimate of regression coefficients.

**Influence**: An observation is said to be influential if removing the observation substantially changes the estimate of the regression coefficients. Influence can be thought of as the product of leverage and outlierness.

**Cook's distance** (or Cook's D): A measure that combines the information of leverage and residual of the observation.

R | SAS | Minitab |