Non-Parametric Tests in IronPython QuickStart Sample

Illustrates how to perform non-parametric tests like the Wilcoxon-Mann-Whitney test and the Kruskal-Wallis test in IronPython.

This sample is also available in: C#, Visual Basic, F#.

Overview

This QuickStart sample demonstrates how to perform non-parametric statistical hypothesis tests using Numerics.NET. Non-parametric tests make fewer assumptions about the underlying data distributions compared to their parametric counterparts.

The sample shows three common non-parametric tests:

The Mann-Whitney (Wilcoxon) rank sum test compares two independent samples to determine if they come from the same distribution. The example uses real data from a study of oyster DNA and protein variation.
The Kruskal-Wallis test extends the Mann-Whitney test to three or more groups. The sample demonstrates this using investment fund growth data from the NIST Engineering Statistics Handbook.
The runs test checks for randomness in a sequence by analyzing the pattern of runs (consecutive values) in the data. The example uses a sequence of binary outcomes to illustrate this test.

For each test, the sample shows how to:

Create the test object with different data input formats
Access the test statistic and p-value
Make decisions using different significance levels
Interpret the results

The code includes detailed comments explaining each step and the statistical concepts involved.

The code

import numerics

from System import Array

from Extreme.Mathematics import *
from Extreme.Statistics import *
from Extreme.Statistics.Tests import *

# Demonstrates how to use non-parametric hypothesis tests 
# like the Mann-Whitney (Wilcoxon) rank sum test and the
# Kruskal-Wallis test.

#
# Mann-Whitney test
#

print "Mann-Whitney Test"

# The Mann-Whitney test compares to samples to see if they were
# drawn from the same distribution.

# We use an example from McDonald, et.al. (1996), who compared
# the geographic variation in oyster DNA to the variation in
# proteins. A significant difference in the samples would suggest
# that natural selection played a role in the oyster diversification.

# There are two ways to create a test with multiple samples.
            
# The first is to put all the data in one variable, # and use a second variable to group the data in the first.
print "\nUsing grouping variable:"

values = NumericalVariable(Array[float]([ \
    -0.005, 0.116,-0.006, 0.095, 0.053, 0.003, \
    -0.005, 0.016, 0.041, 0.016, 0.066, 0.163, \
    0.004, 0.049, 0.006, 0.058, -0.002, 0.015, \
    0.044, 0.024 ]))

DNA = 1
Protein = 2

groups = CategoricalVariable([ \
    DNA, DNA, DNA, DNA, DNA, DNA, \
    Protein, Protein, Protein, Protein, Protein, Protein, \
    Protein, Protein, Protein, Protein, Protein, Protein, \
    Protein, Protein ])

# With this data, we can create the test:
mw = MannWhitneyTest(values, groups)

# We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property:
print "Test statistic: {0:.4f}".format(mw.Statistic)
print "P-value:        {0:.4f}".format(mw.PValue)

# The significance level is the default value of 0.05:
print "Significance level:     {0:F2}".format(mw.SignificanceLevel)
# We can now print the test scores:
print "Reject null hypothesis?", "yes" if mw.Reject() else "no"

# We can get the same scores for the 0.01 significance level by explicitly
# passing the significance level as a parameter to these methods:
print "Significance level:     {0:F2}".format(0.01)
print "Reject null hypothesis?", "yes" if mw.Reject(0.01) else "no"


# The second method is to put the data in different variables
print "\nUsing multiple variables:"

dnaValues = NumericalVariable(Array[float]([ \
    -0.005, 0.116,-0.006, 0.095, 0.053, 0.003 ]))
proteinValues = NumericalVariable(Array[float]([ \
    -0.005, 0.016, 0.041, 0.016, 0.066, 0.163, 0.004, \
    0.049, 0.006, 0.058, -0.002, 0.015, 0.044, 0.024 ]))

# With this data, we can create the test:
mw = MannWhitneyTest(dnaValues, proteinValues)

# We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property:
print "Test statistic: {0:.4f}".format(mw.Statistic)
print "P-value:        {0:.4f}".format(mw.PValue)

# The significance level is the default value of 0.05:
print "Significance level:     {0:F2}".format(mw.SignificanceLevel)
# We can now print the test scores:
print "Reject null hypothesis?", "yes" if mw.Reject() else "no"

#
# Kruskal-Wallis test
#

print "\nKruskal-Wallis Test\n"

# The Kruskal-Wallis test is a generalization of the Mann-Whitney test
# to more than 2 groups.

# The following example was taken from the NIST Engineering Statistics Handbook 
# at http:#www.itl.nist.gov/div898/handbook/prc/section4/prc41.htm
            
# The data represents percentage quarterly growth 
# in 4 investment funds:
aValues = NumericalVariable(Array[float]([ 4.2, 4.6, 3.9, 4.0 ]))
bValues = NumericalVariable(Array[float]([ 3.3, 2.4, 2.6, 3.8, 2.8 ]))
cValues = NumericalVariable(Array[float]([ 1.9, 2.4, 2.1, 2.7, 1.8 ]))
dValues = NumericalVariable(Array[float]([ 3.5, 3.1, 3.7, 4.1, 4.4 ]))

# We simply pass these variables to the constructor:
kw = KruskalWallisTest(aValues, bValues, cValues, dValues)

# We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property:
print "Test statistic: {0:.4f}".format(kw.Statistic)
print "P-value:        {0:.4f}".format(kw.PValue)

# The significance level is the default value of 0.05:
print "Significance level:     {0:F2}".format(kw.SignificanceLevel)
# We can now print the test scores:
print "Reject null hypothesis?", "yes" if kw.Reject() else "no"

#
# Runs test
#

print "\nRuns Test\n"

# The runs test is a test of randomness.

# It compares the lengths of runs of the same value
# in a sample to what would be expected.

# In numerical data, it uses the runs of successively 
# increasing or decreasing values

Male = 1
Female = 2

genders = CategoricalVariable([ \
    Male, Male, Male, Female, Female, Female, \
    Male, Male, Male, Male, Female, Female, \
    Male, Male, Male, Female, Female, Female, \
    Female, Female, Female, Female, Male, Male, \
    Female, Male, Male, Female, Female, Female, \
    Female ])

rt = RunsTest(genders)

# We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property:
print "Test statistic: {0:.4f}".format(rt.Statistic)
print "P-value:        {0:.4f}".format(rt.PValue)

# The significance level is the default value of 0.05:
print "Significance level:     {0:F2}".format(rt.SignificanceLevel)
# We can now print the test scores:
print "Reject null hypothesis?", "yes" if rt.Reject() else "no"