Non-Parametric Tests in IronPython QuickStart Sample
Illustrates how to perform non-parametric tests like the Wilcoxon-Mann-Whitney test and the Kruskal-Wallis test in IronPython.
This sample is also available in: C#, Visual Basic, F#.
Overview
This QuickStart sample demonstrates how to perform non-parametric statistical hypothesis tests using Numerics.NET. Non-parametric tests make fewer assumptions about the underlying data distributions compared to their parametric counterparts.
The sample shows three common non-parametric tests:
-
The Mann-Whitney (Wilcoxon) rank sum test compares two independent samples to determine if they come from the same distribution. The example uses real data from a study of oyster DNA and protein variation.
-
The Kruskal-Wallis test extends the Mann-Whitney test to three or more groups. The sample demonstrates this using investment fund growth data from the NIST Engineering Statistics Handbook.
-
The runs test checks for randomness in a sequence by analyzing the pattern of runs (consecutive values) in the data. The example uses a sequence of binary outcomes to illustrate this test.
For each test, the sample shows how to:
- Create the test object with different data input formats
- Access the test statistic and p-value
- Make decisions using different significance levels
- Interpret the results
The code includes detailed comments explaining each step and the statistical concepts involved.
The code
import numerics
from System import Array
from Extreme.Mathematics import *
from Extreme.Statistics import *
from Extreme.Statistics.Tests import *
# Demonstrates how to use non-parametric hypothesis tests
# like the Mann-Whitney (Wilcoxon) rank sum test and the
# Kruskal-Wallis test.
#
# Mann-Whitney test
#
print "Mann-Whitney Test"
# The Mann-Whitney test compares to samples to see if they were
# drawn from the same distribution.
# We use an example from McDonald, et.al. (1996), who compared
# the geographic variation in oyster DNA to the variation in
# proteins. A significant difference in the samples would suggest
# that natural selection played a role in the oyster diversification.
# There are two ways to create a test with multiple samples.
# The first is to put all the data in one variable, # and use a second variable to group the data in the first.
print "\nUsing grouping variable:"
values = NumericalVariable(Array[float]([ \
-0.005, 0.116,-0.006, 0.095, 0.053, 0.003, \
-0.005, 0.016, 0.041, 0.016, 0.066, 0.163, \
0.004, 0.049, 0.006, 0.058, -0.002, 0.015, \
0.044, 0.024 ]))
DNA = 1
Protein = 2
groups = CategoricalVariable([ \
DNA, DNA, DNA, DNA, DNA, DNA, \
Protein, Protein, Protein, Protein, Protein, Protein, \
Protein, Protein, Protein, Protein, Protein, Protein, \
Protein, Protein ])
# With this data, we can create the test:
mw = MannWhitneyTest(values, groups)
# We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property:
print "Test statistic: {0:.4f}".format(mw.Statistic)
print "P-value: {0:.4f}".format(mw.PValue)
# The significance level is the default value of 0.05:
print "Significance level: {0:F2}".format(mw.SignificanceLevel)
# We can now print the test scores:
print "Reject null hypothesis?", "yes" if mw.Reject() else "no"
# We can get the same scores for the 0.01 significance level by explicitly
# passing the significance level as a parameter to these methods:
print "Significance level: {0:F2}".format(0.01)
print "Reject null hypothesis?", "yes" if mw.Reject(0.01) else "no"
# The second method is to put the data in different variables
print "\nUsing multiple variables:"
dnaValues = NumericalVariable(Array[float]([ \
-0.005, 0.116,-0.006, 0.095, 0.053, 0.003 ]))
proteinValues = NumericalVariable(Array[float]([ \
-0.005, 0.016, 0.041, 0.016, 0.066, 0.163, 0.004, \
0.049, 0.006, 0.058, -0.002, 0.015, 0.044, 0.024 ]))
# With this data, we can create the test:
mw = MannWhitneyTest(dnaValues, proteinValues)
# We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property:
print "Test statistic: {0:.4f}".format(mw.Statistic)
print "P-value: {0:.4f}".format(mw.PValue)
# The significance level is the default value of 0.05:
print "Significance level: {0:F2}".format(mw.SignificanceLevel)
# We can now print the test scores:
print "Reject null hypothesis?", "yes" if mw.Reject() else "no"
#
# Kruskal-Wallis test
#
print "\nKruskal-Wallis Test\n"
# The Kruskal-Wallis test is a generalization of the Mann-Whitney test
# to more than 2 groups.
# The following example was taken from the NIST Engineering Statistics Handbook
# at http:#www.itl.nist.gov/div898/handbook/prc/section4/prc41.htm
# The data represents percentage quarterly growth
# in 4 investment funds:
aValues = NumericalVariable(Array[float]([ 4.2, 4.6, 3.9, 4.0 ]))
bValues = NumericalVariable(Array[float]([ 3.3, 2.4, 2.6, 3.8, 2.8 ]))
cValues = NumericalVariable(Array[float]([ 1.9, 2.4, 2.1, 2.7, 1.8 ]))
dValues = NumericalVariable(Array[float]([ 3.5, 3.1, 3.7, 4.1, 4.4 ]))
# We simply pass these variables to the constructor:
kw = KruskalWallisTest(aValues, bValues, cValues, dValues)
# We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property:
print "Test statistic: {0:.4f}".format(kw.Statistic)
print "P-value: {0:.4f}".format(kw.PValue)
# The significance level is the default value of 0.05:
print "Significance level: {0:F2}".format(kw.SignificanceLevel)
# We can now print the test scores:
print "Reject null hypothesis?", "yes" if kw.Reject() else "no"
#
# Runs test
#
print "\nRuns Test\n"
# The runs test is a test of randomness.
# It compares the lengths of runs of the same value
# in a sample to what would be expected.
# In numerical data, it uses the runs of successively
# increasing or decreasing values
Male = 1
Female = 2
genders = CategoricalVariable([ \
Male, Male, Male, Female, Female, Female, \
Male, Male, Male, Male, Female, Female, \
Male, Male, Male, Female, Female, Female, \
Female, Female, Female, Female, Male, Male, \
Female, Male, Male, Female, Female, Female, \
Female ])
rt = RunsTest(genders)
# We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property:
print "Test statistic: {0:.4f}".format(rt.Statistic)
print "P-value: {0:.4f}".format(rt.PValue)
# The significance level is the default value of 0.05:
print "Significance level: {0:F2}".format(rt.SignificanceLevel)
# We can now print the test scores:
print "Reject null hypothesis?", "yes" if rt.Reject() else "no"