Non-Parametric Tests in IronPython QuickStart Sample
Illustrates how to perform non-parametric tests like the Wilcoxon-Mann-Whitney test and the Kruskal-Wallis test in IronPython.
View this sample in: C# Visual Basic F#
```Python import numerics from System import Array from Extreme.Mathematics import * from Extreme.Statistics import * from Extreme.Statistics.Tests import * # Demonstrates how to use non-parametric hypothesis tests # like the Mann-Whitney (Wilcoxon) rank sum test and the # Kruskal-Wallis test. # # Mann-Whitney test # print "Mann-Whitney Test" # The Mann-Whitney test compares to samples to see if they were # drawn from the same distribution. # We use an example from McDonald, et.al. (1996), who compared # the geographic variation in oyster DNA to the variation in # proteins. A significant difference in the samples would suggest # that natural selection played a role in the oyster diversification. # There are two ways to create a test with multiple samples. # The first is to put all the data in one variable, # and use a second variable to group the data in the first. print "\nUsing grouping variable:" values = NumericalVariable(Array[float]([ \ -0.005, 0.116,-0.006, 0.095, 0.053, 0.003, \ -0.005, 0.016, 0.041, 0.016, 0.066, 0.163, \ 0.004, 0.049, 0.006, 0.058, -0.002, 0.015, \ 0.044, 0.024 ])) DNA = 1 Protein = 2 groups = CategoricalVariable([ \ DNA, DNA, DNA, DNA, DNA, DNA, \ Protein, Protein, Protein, Protein, Protein, Protein, \ Protein, Protein, Protein, Protein, Protein, Protein, \ Protein, Protein ]) # With this data, we can create the test: mw = MannWhitneyTest(values, groups) # We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property: print "Test statistic: {0:.4f}".format(mw.Statistic) print "P-value: {0:.4f}".format(mw.PValue) # The significance level is the default value of 0.05: print "Significance level: {0:F2}".format(mw.SignificanceLevel) # We can now print the test scores: print "Reject null hypothesis?", "yes" if mw.Reject() else "no" # We can get the same scores for the 0.01 significance level by explicitly # passing the significance level as a parameter to these methods: print "Significance level: {0:F2}".format(0.01) print "Reject null hypothesis?", "yes" if mw.Reject(0.01) else "no" # The second method is to put the data in different variables print "\nUsing multiple variables:" dnaValues = NumericalVariable(Array[float]([ \ -0.005, 0.116,-0.006, 0.095, 0.053, 0.003 ])) proteinValues = NumericalVariable(Array[float]([ \ -0.005, 0.016, 0.041, 0.016, 0.066, 0.163, 0.004, \ 0.049, 0.006, 0.058, -0.002, 0.015, 0.044, 0.024 ])) # With this data, we can create the test: mw = MannWhitneyTest(dnaValues, proteinValues) # We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property: print "Test statistic: {0:.4f}".format(mw.Statistic) print "P-value: {0:.4f}".format(mw.PValue) # The significance level is the default value of 0.05: print "Significance level: {0:F2}".format(mw.SignificanceLevel) # We can now print the test scores: print "Reject null hypothesis?", "yes" if mw.Reject() else "no" # # Kruskal-Wallis test # print "\nKruskal-Wallis Test\n" # The Kruskal-Wallis test is a generalization of the Mann-Whitney test # to more than 2 groups. # The following example was taken from the NIST Engineering Statistics Handbook # at http:#www.itl.nist.gov/div898/handbook/prc/section4/prc41.htm # The data represents percentage quarterly growth # in 4 investment funds: aValues = NumericalVariable(Array[float]([ 4.2, 4.6, 3.9, 4.0 ])) bValues = NumericalVariable(Array[float]([ 3.3, 2.4, 2.6, 3.8, 2.8 ])) cValues = NumericalVariable(Array[float]([ 1.9, 2.4, 2.1, 2.7, 1.8 ])) dValues = NumericalVariable(Array[float]([ 3.5, 3.1, 3.7, 4.1, 4.4 ])) # We simply pass these variables to the constructor: kw = KruskalWallisTest(aValues, bValues, cValues, dValues) # We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property: print "Test statistic: {0:.4f}".format(kw.Statistic) print "P-value: {0:.4f}".format(kw.PValue) # The significance level is the default value of 0.05: print "Significance level: {0:F2}".format(kw.SignificanceLevel) # We can now print the test scores: print "Reject null hypothesis?", "yes" if kw.Reject() else "no" # # Runs test # print "\nRuns Test\n" # The runs test is a test of randomness. # It compares the lengths of runs of the same value # in a sample to what would be expected. # In numerical data, it uses the runs of successively # increasing or decreasing values Male = 1 Female = 2 genders = CategoricalVariable([ \ Male, Male, Male, Female, Female, Female, \ Male, Male, Male, Male, Female, Female, \ Male, Male, Male, Female, Female, Female, \ Female, Female, Female, Female, Male, Male, \ Female, Male, Male, Female, Female, Female, \ Female ]) rt = RunsTest(genders) # We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property: print "Test statistic: {0:.4f}".format(rt.Statistic) print "P-value: {0:.4f}".format(rt.PValue) # The significance level is the default value of 0.05: print "Significance level: {0:F2}".format(rt.SignificanceLevel) # We can now print the test scores: print "Reject null hypothesis?", "yes" if rt.Reject() else "no" ```