Evaluate null distribution function of Mann-Whitney (Wilcoxon) rank sum statistic through simulation

For simplicity, I assume there are no ties between the two samples.

Background

Rank sum statistic can be performed using the Mann–Whitney statistic. Let

$$ U=\sum_{i=1}^{m} \sum_{j=1}^{n} \phi\left(X_{i}, Y_{j}\right) $$

where

$$ \phi\left(X_{i}, Y_{j}\right)=\left\{\begin{array}{ll}{1} & {\text { if } X_{i} \lt Y_{j},} \\ {0} & {\text { otherwise. }}\end{array}\right. $$

The statistic U counts the number of "X before Y" predecessors. It is easy to show that

$$ W=U+\frac{n(n+1)}{2}. $$

CDF Simulation

Calculate F(t) for Mann-Whitney sum rank statistic W

# CDF for W statistic simulation, return F(t)
W_CDF <- function(t, m, n){
  count <- 0
  for (i in 1:10000) {
    W = sample(m+n, n)
    if (sum(W) < t) count = count+1
  }
  cdf_value = count/10000
  return (cdf_value)
}

Plot CDF using simulated data

Use mtcars as sample data, nrow(mtcars): 32 observations, sum(mtcars$am == "1"): we have 13 with manual transmission and 19 with automatic transmission. If there were really no difference in the displacement for automatic and manual transmissions, then the ranks of the displacement would be assigned randomly.

set.seed(1)
library(ggplot2)
#  Suppose the ranks of the displacement would be assigned randomly. To simulate that, we imagine that 
#   we have 32 possible ranks, and we sample 13 of them to form the ranks of displacements for the manual transmissions.
# manualRanks <- sample(32, 13)
# The automatic ranks are the ones that are not in manualRanks. We need to pull them out using which(!(1:32 %in% manualRanks))
# autoRanks <- which(!(1:32 %in% manualRanks))
# The W value is equal to the sum of the ranks for the smaller random sample minus the minimum such sum, namely n(n+1)/2=13*14/2=91
# W_sim <- sum(manualRanks)-91

W_sim <- c()

simPermuTest <- function(iter) {
  for (i in 1:iter) {
    manualRanks <- sample(32, 13)
    autoRanks <- which(!(1:32 %in% manualRanks))
    W_sim <- c(W_sim, sum(manualRanks)-91)
}
  x<-seq(1, iter, 1)
  y<-seq(1,iter*3,3)
  dev.new()
  plot(x, y, type = NULL)
  hist(W_sim, freq = F)
  lines(density(W_sim), col="blue")
  lines(density(W_real), col="red")
  dev.new()
  plot(ecdf(W_sim))
 }
simPermuTest(10000)

We get:

Reference