Simple Linear Regression

Introduction

A simple linear regression model is a model describing a linear relationship between two variables x=(x₁, x₂,..., x_n) and y=(y₁, y₂,..., y_n) as:
y=a+bx+e, where a and b are some unknown parameters and e=(e₁, e₂,..., e_n) is the error term.

The goal of simple linear regression is to use the variables x and y to estimates the unknown parameters a and b, under some conditions on the errors term e.

Assumptions

The assumptions on simple linear regression models are the same as the assumptions on the errors in Linear Models, namely

1. e₁, e₂,..., e_n are random and independent,

2. e₁, e₂,..., e_n all have mean 0,

3. e₁, e₂,..., e_n all have the same variance (homoscedasticity),

4. e₁, e₂,..., e_n are normally distributed.

Terminology

Residual: The difference between the predicted value (based on the regression equation) and the actual, observed value.

Outlier: In linear regression, an outlier is an observation with large residual. In other words, it is an observation whose dependent-variable value is unusual given its value on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.

Leverage: An observation with an extreme value on a predictor variable is a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. High leverage points can have a great amount of effect on the estimate of regression coefficients.

Influence: An observation is said to be influential if removing the observation substantially changes the estimate of the regression coefficients. Influence can be thought of as the product of leverage and outlierness.

Cook's distance (or Cook's D): A measure that combines the information of leverage and residual of the observation.

Applications

Click on one of the link below to see how to perform a simple linear regression with the chosen of the package.

R

SAS

Minitab

You are here

Simple Linear Regression