Simple Linear Regression
A simple linear regression model is a model describing a linear relationship between two variables x=(x1, x2,..., xn) and y=(y1, y2,..., yn) as:
y=a+bx+e, where a and b are some unknown parameters and e=(e1, e2,..., en) is the error term.
Residual: The difference between the predicted value (based on the regression equation) and the actual, observed value.
Outlier: In linear regression, an outlier is an observation with large residual. In other words, it is an observation whose dependent-variable value is unusual given its value on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.
Leverage: An observation with an extreme value on a predictor variable is a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. High leverage points can have a great amount of effect on the estimate of regression coefficients.
Influence: An observation is said to be influential if removing the observation substantially changes the estimate of the regression coefficients. Influence can be thought of as the product of leverage and outlierness.
Cook's distance (or Cook's D): A measure that combines the information of leverage and residual of the observation.
Applications
R | SAS | Minitab |