Non-linear Regression

Data description

The data set "LocoPlay.txt" located at "http://ramanujan.math.trinity.edu/ekwessi/misc/LocoPlay.txt represents the age at which animal develops Locomotion(x) and the age at which they develop Play(y).

 

# uploading the data into the R-workspace
> CW=read.table(file="http://ramanujan.math.trinity.edu/ekwessi/misc/Locoplay.txt",header=T)
# viewing the first lines of the dataset
>head(CW)                    

 

Analysis
 
     Species                  Locomotion  Play
1   HomoSapiens        360                90
2   GorillaGorilla         165              105
3   FelisCatus               21                21
4   CanussFamiliaris     23               26
5   RattusNorvegians    11               14
6  TurdusMerula          18                28
 

Plot the data to guess the adequate model
>plot(LP[,2],LP[,3],main="An example of Logarithmic regression"xlab="Locomotion"ylab="Play",col="red")

               

 
We first perfom a fit without linearization as the graph suggests a model of the form y=a*x^b, where a and b are two parameters to be estimated directly from R, using the function "nls". This function requires starting values for a and b. Note that there are self-starting models, see for example 

> y=LP$Locomotion                            # Renaming the variables
> x=LP$Play                                  # Renaming the variables
> logreg=nls(y~a*x^b,stat=list(a=1,b=0.2), data=LP)
> summary(logreg)                                                         # Summary of the nonlinear regression

 
Formula: y ~ a * x^b
 
Parameters:
  Estimate Std. Error t value Pr(>|t|)   
a 12.07893    5.11064   2.363  0.04236 * 
b  0.38509    0.08681   4.436  0.00163 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 19.73 on 9 degrees of freedom
 
Number of iterations to convergence: 12 
Achieved convergence tolerance: 2.379e-066
 

>n=seq(1.400.0.01)
>m=12.0789*n^0.38509

> lines(n,m,type="l",col="blue", lwd=2)                  #Fitting the model into the data

 
We now perfom the regression, assuming linearity between log(x) and log(y). Note the model chosen, y=a*x^b is linearizable.

# Using nonlinear regression to find the regression coefficients a and b
>lreg2=nls(y~L/(1+exp(a+b*x)),start=list(a=5,b=-1,L=1044),data=CW)
>logreg2=lm(log(y)~log(x), data=LP)
#We  find a and b with linearization # y=a*x^b->log(y)=log(a)+blog(x)
#In this case, the real value of a will be the exponetial of the value #extimated from logreg2, while b remains the same
>summary(logreg2)                                  # summary of the nonlinear regression

 
Call:
lm(formula = log(y) ~ log(x), data = LP)
 
Residuals:
     Min       1Q   Median       3Q      Max 
-0.48933 -0.30836  0.02253  0.27588  0.51897 
 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.6893     0.4142   4.079 0.002763 ** 
log(x)        0.5606     0.1070   5.238 0.000536 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 0.3845 on 9 degrees of freedom
Multiple R-squared:  0.753, Adjusted R-squared:  0.7256 
F-statistic: 27.44 on 1 and 9 DF,  p-value: 0.0005361
 

>k=exp(logreg2$coef[1])*n^logreg2$coef[2]
>lines(n,k,type="l",col="red") # Fitting the model into the data
>legend(230,80,c("With linearization","without Linearization"),fill=c("red","blue"))

 

We compare the two model using the AIC (Akaike's information Criterion)

> AIC(logreg,logreg2)      # Comparing the two models using their AIC.

 
             df       AIC
logreg   3        100.61227
logreg2  3         13.97931
 
 

The second model has a the smallest AIC, thus will be better at predicting Play time using Locomotion .