Evaluate null distribution function of Mann-Whitney (Wilcoxon) rank sum statistic through simulation
For simplicity, I assume there are no ties between the two samples.
Background
Rank sum statistic can be performed using the Mann–Whitney statistic. Let
$$ U=\sum_{i=1}^{m} \sum_{j=1}^{n} \phi\left(X_{i}, Y_{j}\right) $$
where
$$ \phi\left(X_{i}, Y_{j}\right)=\left\{\begin{array}{ll}{1} & {\text { if } X_{i} \lt Y_{j},} \\ {0} & {\text { otherwise. }}\end{array}\right. $$
The statistic U counts the number of "X before Y" predecessors. It is easy to show that
$$ W=U+\frac{n(n+1)}{2}. $$
CDF Simulation
Calculate F(t) for Mann-Whitney sum rank statistic W
# CDF for W statistic simulation, return F(t)
W_CDF <- function(t, m, n){
count <- 0
for (i in 1:10000) {
W = sample(m+n, n)
if (sum(W) < t) count = count+1
}
cdf_value = count/10000
return (cdf_value)
}
Plot CDF using simulated data
Use mtcars
as sample data, nrow(mtcars)
: 32 observations, sum(mtcars$am == "1")
: we have 13 with manual transmission and 19 with automatic transmission.
If there were really no difference in the displacement for automatic and manual transmissions, then the ranks of the displacement would be assigned randomly.
set.seed(1)
library(ggplot2)
# Suppose the ranks of the displacement would be assigned randomly. To simulate that, we imagine that
# we have 32 possible ranks, and we sample 13 of them to form the ranks of displacements for the manual transmissions.
# manualRanks <- sample(32, 13)
# The automatic ranks are the ones that are not in manualRanks. We need to pull them out using which(!(1:32 %in% manualRanks))
# autoRanks <- which(!(1:32 %in% manualRanks))
# The W value is equal to the sum of the ranks for the smaller random sample minus the minimum such sum, namely n(n+1)/2=13*14/2=91
# W_sim <- sum(manualRanks)-91
W_sim <- c()
simPermuTest <- function(iter) {
for (i in 1:iter) {
manualRanks <- sample(32, 13)
autoRanks <- which(!(1:32 %in% manualRanks))
W_sim <- c(W_sim, sum(manualRanks)-91)
}
x<-seq(1, iter, 1)
y<-seq(1,iter*3,3)
dev.new()
plot(x, y, type = NULL)
hist(W_sim, freq = F)
lines(density(W_sim), col="blue")
lines(density(W_real), col="red")
dev.new()
plot(ecdf(W_sim))
}
simPermuTest(10000)
We get:
Reference
- [1] Statistics 240: Nonparametric and Robust Methods, Lecture Notes Chapter 3, Philip B. Stark
- [2] Statistics 240: Nonparametric and Robust Methods, Lecture Notes Chapter 4, Philip B. Stark
- [3] Foundations of Statistics with R
- [4] Mann-Whitney Simulation
- [5] Nonparametric Statistical Methods, third edition, Myles Hollander, Douglas A. Wolfe, Eric Chicken, John Wiley & Sons, 2013