Multiple Linear Regression

Introduction

The multiple linear regression model is a model that describes a relationship between a dependent variable y=(y₁, y₂,..., y_n) and p independent variables x₁, x₂,..., x_p , where x_i=(x_i1, x_i2,..., x_in ), i=1,...,p for p>1 as y=a+b₁x₁+b₂x₂+....+b_px_p +e, where e=(e₁, e₂,..., e_n) is the error vector term, and b₁, b₂,...,b_p are unknown parameters to be estimated. The terminology for the regression diagnostics mirrors that of simple linear regression with just few exceptions.

Assumptions

The assumptions on a multiple linear regression models are the same as the assumptions on the errors in Linear Models, namely

1. e₁, e₂,..., e_n are random and independent,

2. e₁, e₂,..., e_n all have mean 0,

3. e₁, e₂,..., e_n all have the same variance (homoscedasticity),

4. e₁, e₂,..., e_n are normally distributed.

Terminology

Residual: The difference between the predicted value (based on the regression equation) and the actual, observed value.

Outlier: In linear regression, an outlier is an observation with large residual. In other words, it is an observation whose dependent-variable value is unusual given its value on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.

Leverage: An observation with an extreme value on a predictor variable is a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. High leverage points can have a great amount of effect on the estimate of regression coefficients.

Influence: An observation is said to be influential if removing the observation substantially changes the estimate of the regression coefficients. Influence can be thought of as the product of leverage and outlierness.

Cook's distance (or Cook's D): A measure that combines the information of leverage and residual of the observation.

Applications

Click on one of the link below to see how to perform a simple linear regression with the chosen of the package.

R

SAS

Minitab

You are here

Multiple Linear Regression