Non-Parametric Tests in C# QuickStart Sample

Illustrates how to perform non-parametric tests like the Wilcoxon-Mann-Whitney test and the Kruskal-Wallis test in C#.

This sample is also available in: Visual Basic, F#, IronPython.

Overview

This QuickStart sample demonstrates how to perform non-parametric statistical hypothesis tests using Numerics.NET. Non-parametric tests make fewer assumptions about the underlying data distributions compared to their parametric counterparts.

The sample shows three common non-parametric tests:

  • The Mann-Whitney (Wilcoxon) rank sum test compares two independent samples to determine if they come from the same distribution. The example uses real data from a study of oyster DNA and protein variation.

  • The Kruskal-Wallis test extends the Mann-Whitney test to three or more groups. The sample demonstrates this using investment fund growth data from the NIST Engineering Statistics Handbook.

  • The runs test checks for randomness in a sequence by analyzing the pattern of runs (consecutive values) in the data. The example uses a sequence of binary outcomes to illustrate this test.

For each test, the sample shows how to:

  • Create the test object with different data input formats
  • Access the test statistic and p-value
  • Make decisions using different significance levels
  • Interpret the results

The code includes detailed comments explaining each step and the statistical concepts involved.

The code

using System;

using Numerics.NET;
using Numerics.NET.Statistics;
using Numerics.NET.Statistics.Tests;

// Demonstrates how to use non-parametric hypothesis tests
// like the Mann-Whitney (Wilcoxon) rank sum test and the
// Kruskal-Wallis test.

// The license is verified at runtime. We're using
// a 30 day trial key here. For more information, see
//     https://numerics.net/trial-key
Numerics.NET.License.Verify("your-trial-key-here");

//
// Mann-Whitney test
//

Console.WriteLine("Mann-Whitney Test");

// The Mann-Whitney test compares to samples to see if they were
// drawn from the same distribution.

// We use an example from McDonald, et.al. (1996), who compared
// the geographic variation in oyster DNA to the variation in
// proteins. A significant difference in the samples would suggest
// that natural selection played a role in the oyster diversification.

// There are two ways to create a test with multiple samples.

// The first is to put all the data in one variable,
// and use a second variable to group the data in the first.
Console.WriteLine("\nUsing grouping variable:");

var values = Vector.Create(new double[] {
    -0.005, 0.116,-0.006, 0.095, 0.053, 0.003,
    -0.005, 0.016, 0.041, 0.016, 0.066,
     0.163, 0.004, 0.049, 0.006, 0.058,
    -0.002, 0.015, 0.044, 0.024
});
var groups = Vector.Create(new Group[] {
    Group.DNA, Group.DNA, Group.DNA, Group.DNA, Group.DNA, Group.DNA,
    Group.Protein, Group.Protein, Group.Protein, Group.Protein, Group.Protein,
    Group.Protein, Group.Protein, Group.Protein, Group.Protein, Group.Protein,
    Group.Protein, Group.Protein, Group.Protein, Group.Protein
}).AsCategorical();

// With this data, we can create the test:
var mw = new MannWhitneyTest<double>(values, groups);

// We can obtan the value of the test statistic through the Statistic property,
// and the corresponding P-value through the PValue property:
Console.WriteLine($"Test statistic: {mw.Statistic:F4}");
Console.WriteLine($"P-value:        {mw.PValue:F4}");

// The significance level is the default value of 0.05:
Console.WriteLine($"Significance level:     {mw.SignificanceLevel:F2}");
// We can now print the test scores:
Console.WriteLine($"Reject null hypothesis? {(mw.Reject() ? "yes" : "no")}");

// We can get the same scores for the 0.01 significance level by explicitly
// passing the significance level as a parameter to these methods:
Console.WriteLine($"Significance level:     {0.01:F2}");
Console.WriteLine($"Reject null hypothesis? {(mw.Reject(0.01) ? "yes" : "no")}");

// The second method is to put the data in different variables
Console.WriteLine("\nUsing multiple variables:");

var dnaValues = Vector.Create(new double[] {
    -0.005, 0.116,-0.006, 0.095, 0.053, 0.003 });
var proteinValues = Vector.Create(new double[] {
    -0.005, 0.016, 0.041, 0.016, 0.066,
     0.163, 0.004, 0.049, 0.006, 0.058,
    -0.002, 0.015, 0.044, 0.024
});

// With this data, we can create the test:
mw = new MannWhitneyTest<double>(dnaValues, proteinValues);

// We can obtan the value of the test statistic through the Statistic property,
// and the corresponding P-value through the PValue property:
Console.WriteLine($"Test statistic: {mw.Statistic:F4}");
Console.WriteLine($"P-value:        {mw.PValue:F4}");

// The significance level is the default value of 0.05:
Console.WriteLine($"Significance level:     {mw.SignificanceLevel:F2}");
// We can now print the test scores:
Console.WriteLine($"Reject null hypothesis? {(mw.Reject() ? "yes" : "no")}");

//
// Kruskal-Wallis test
//

Console.WriteLine("\nKruskal-Wallis Test\n");

// The Kruskal-Wallis test is a generalization of the Mann-Whitney test
// to more than 2 groups.

// The following example was taken from the NIST Engineering Statistics Handbook
// at http://www.itl.nist.gov/div898/handbook/prc/section4/prc41.htm

// The data represents percentage quarterly growth
// in 4 investment funds:
var aValues = Vector.Create(new double[] { 4.2, 4.6, 3.9, 4.0 });
var bValues = Vector.Create(new double[] { 3.3, 2.4, 2.6, 3.8, 2.8 });
var cValues = Vector.Create(new double[] { 1.9, 2.4, 2.1, 2.7, 1.8 });
var dValues = Vector.Create(new double[] { 3.5, 3.1, 3.7, 4.1, 4.4 });

// We simply pass these variables to the constructor:
var kw = new KruskalWallisTest(aValues, bValues, cValues, dValues);

// We can obtan the value of the test statistic through the Statistic property,
// and the corresponding P-value through the PValue property:
Console.WriteLine($"Test statistic: {kw.Statistic:F4}");
Console.WriteLine($"P-value:        {kw.PValue:F4}");

// The significance level is the default value of 0.05:
Console.WriteLine($"Significance level:     {kw.SignificanceLevel:F2}");
// We can now print the test scores:
Console.WriteLine($"Reject null hypothesis? {(kw.Reject() ? "yes" : "no")}");

//
// Runs test
//

Console.WriteLine("\nRuns Test\n");

// The runs test is a test of randomness.

// It compares the lengths of runs of the same value
// in a sample to what would be expected.

var genders = Vector.Create(new Gender[] {
    Gender.Male, Gender.Male, Gender.Male, Gender.Female, Gender.Female,
    Gender.Female, Gender.Male, Gender.Male, Gender.Male, Gender.Male,
    Gender.Female, Gender.Female, Gender.Male, Gender.Male, Gender.Male,
    Gender.Female, Gender.Female, Gender.Female, Gender.Female, Gender.Female,
    Gender.Female, Gender.Female, Gender.Male, Gender.Male, Gender.Female,
    Gender.Male, Gender.Male, Gender.Female, Gender.Female, Gender.Female,
    Gender.Female}).AsCategorical();

var rt = new RunsTest<Gender>(genders);

// We can obtan the value of the test statistic through the Statistic property,
// and the corresponding P-value through the PValue property:
Console.WriteLine($"Test statistic: {rt.Statistic:F4}");
Console.WriteLine($"P-value:        {rt.PValue:F4}");

// The significance level is the default value of 0.05:
Console.WriteLine($"Significance level:     {rt.SignificanceLevel:F2}");
// We can now print the test scores:
Console.WriteLine($"Reject null hypothesis? {(rt.Reject() ? "yes" : "no")}");

Console.Write("Press any key to exit.");
Console.ReadLine();

enum Group {
    DNA,
    Protein
}

enum Gender {
    Male,
    Female
}