Wilcoxon-Signed-Rank Test with R
Data Description
In an annual survey, government and private sector workers were matched as closely as possible (type of job,education, experience) and the salaries of the matched pairs were obtained.
We would like to determine if federal pay scale commensurate with private sector salaries. For, we will test the hull hypothesis that there is not difference in the pay-scale versus the alternative that the private sector pay-scale is greater.
The dataset "pay.txt" located at "http://ramanujan.math.trinity.edu/ekwessi/misc/pay.txt"(source: J.T. McClave and G. Benson (1978)) represents 12 such matched pairs of salaries (in dollars).
Analysis
The Wilcoxon-Signed-Rank test is obtained in r using the command "wilcox.test" in the base package "stats".
>pay<-read.table(file="http://ramanujan.math.trinity.edu/ekwessi/misc/pay.txt",header=T) # Loading the data into the R-worspace
>head(pay) # Visualizing the first 6 pairs
Pair Government Private
1 1 11750 12500
2 2 20900 22300
3 3 14800 14500
4 4 29900 32300
5 5 21500 20800
6 6 18400 19200
>x=pay$Privvate #Renaming the variables
>y=pay$Government
>wilcox(x,y,alternative="greater", paired=T,exact=F,correct=F) # Wilcoxon-signed-Rank test, without continuity correction.
Wilcoxon signed rank test
data: x and y
V = 62.5, p-value = 0.03258
alternative hypothesis: true location shift is greater than 0
>wilcox(x,y,alternative="greater", paired=T,exact=F,correct=F) # Wilcoxon-signed-Rank test with continuity correction.
Wilcoxon signed rank test with continuity correction
data: x and y
V = 62.5, p-value = 0.03554
alternative hypothesis: true location shift is greater than 0
Interpretation of the results:
The p-value for this test (0.03354) which is small (i.e <0.05) indicates that there is a statistical significant evidence that government workers receive less in pay than those in the private sector.
Remark:
Note that we use a paired t-test to answer the question, we would get
>t.test(x,y,,alternative="greater",paired=T) # Paired t-test
Paired t-test
data: x and y
t = 2.0673, df = 11, p-value = 0.03153
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
86.9921 Inf
sample estimates:
mean of the differences
662.5
Note that the p-value (0.03153) is still small but the results are not valid as the assumption of normality usifn Anderson Darling test fails for both x and y
>ad.test(x)
Anderson-Darling normality test
data: x
A = 0.8254, p-value = 0.023
>ad.test(y)
Anderson-Darling normality test
data: y
A = 0.949, p-value = 0.0108
Going Further:
Check all the options of the command wilcox.test with help(wilcox.test)