Simple Linear Regression with R

Data description:

Crickets make their chirping sound by sliding one wing cover very rapidly back and forth over the other. It is believed that there is linear relationship between  temperature of the Crickets and the frequency at wich they chirp. The file "crickets.txt" contains the temperatures (Temperature) and the frequencies (ChirpsPerSeconds) of 20 randomly selected crickets.

Analysis:

>Cric=read.table("file="http://ramanujan.math.trinity.edu/ekwessi/misc/crickets.txt", header=TRUE)     # Uploading the data
into the workspace and renaming it  as "Cric"

>head(Cric)                  #Display the first lines of the data set.

Observation ChirpsPerSeconds Temperature

1           1             20.0        88.9

2           2             16.0        71.6

3           3             19.8        93.3

4           4             18.4        84.3

5           5             17.1        80.6

6           6             15.5        75.2

>x=Cric$ChirpsPerSeconds           # Renaming the second column as x 
>y=Cric$Temperature                # Renaming the third column as y
plot(x,ymain="Linar regression"xlab="Chirps per seconds"ylab="Tempertaure",col="red")    # Scatter plot of y against x 

Click to view
 

>sreg=lm(y~x)                       # Performing simple linear regression of y against x and saving it as " sreg"
summary(sreg)                     # Obtaining important summary about the regression 

Call:

lm(formula = y ~ x)

Residuals:

 Min      1Q      Median      3Q     Max 

-6.5041 -1.9044  0.4589  2.7562  5.0222 

Coefficients:

            Estimate Std. Error t value Pr(>|t|)    

(Intercept)  24.8401    10.0227   2.478   0.0277 *  

x             3.3158     0.5989   5.536 9.61e-05 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.814 on 13 degrees of freedom

Multiple R-squared:  0.7022, Adjusted R-squared:  0.6793 

F-statistic: 30.65 on 1 and 13 DF,  p-value: 9.606e-05

>lines(abline(sreg))         # Fitting the regression line

 

Click to view

>op=par(mfrow=c(2,2))              # Dividing the plot window into four frames
>plot(sreg)                        # Regression plots and diagnostics
>par(op)                           # Reset to previous .

Click to view

Interpretation of the results:

  1. The scatter shows a linea trend: as the frequency increases, so does the temperature. This suggests the existence of a linear relationship between these two variables.
  2. The summary of the regression suggests that the best line to fit the scatter plot has an equation of the form y=24.8401+3.3158x, that is, a=24.8401, b=3.3158.
  3. The small p-values (0.0277 and 9.61e-05) suggest the two estimates are significant.
  4. The coefficient of determination 0.6793, suggests that about 68% of change observed in Temperature is due to change in Frequency.
  5. A look a the residuals plot suggest minor departure from normality, but also that observations 2 and 11 could be problematic to our model. The question  is whether they can be dropped from our model (because of error in data collection) or if they can left in and therefore a robust regression should be performed. To answer, this rule of thumb is  to check if their Cooks' distance is larger than 1/n, where n is the number of observations in the data set.

>cd<-cooks.distance(sreg)        # Saving the Cooks' distances in 
>Cric1<-cbind(Cric,cd)           # Adding a the column "cd" to the original dataset and crea
>Cric1[cd>4/15,]                 # Displaying the data point whose Cooks distance are deemed large, that is, greater than 1/15    

[1] Observation      ChirpsPerSeconds Temperature      cd              
<0 rows> (or 0-length row.names)
 
We conclude that no observation has a large Cook's distance, thus the deviations observed are probably minor.