Testing Goodness-Of-Fit

It is often necessary to verify whether the distribution of a variable fits a certain theoretical distribution. Goodness-of-fit tests can be used to perform this verification. Goodness of fit tests require all sample values. They can't be performed using only the summary statistics.

The Chi-Square Test for Goodness-of-Fit

The chi-square ( $χ^{2}$ ) goodness-of-fit test compares observed cell frequencies from a sample with the cell frequencies expected from the proposed underlying distribution. It is used to determine if a sample comes from a population with a specific distribution.

Definition

The null hypothesis ( $H_{0}$ ) is that the observed cell frequencies ( $O_{i}$ ) are equal to the expected frequencies ( $E_{i}$ ) for all cells. Formally, $H_{0} : O_{i} = E_{i}$ for all $i$ .

The test statistic ( $χ^{2}$ ) is calculated as:

χ^{2} = \sum_{i = 1}^{k} \frac{(O_{i} - E_{i})^{2}}{E_{i}}

where $k$ is the number of cells, $O_{i}$ is the observed frequency for cell $i$ , and $E_{i}$ is the expected frequency for cell $i$ .

The distribution of the test statistic $χ^{2}$ is approximated by the chi-square distribution with $k - 1$ degrees of freedom, where $k$ is the number of cells.

Assumptions

The chi-square test assumes that the variable is categorical in nature. When the variable is continuous, the chi-square test cannot be used directly. It is possible to group the data into cells and use the categorized data in the test. The sample must be randomly selected from the population, and the expected frequency of each cell should be large enough, typically at least 5.

Applications

The chi-square goodness-of-fit test is used in various fields such as genetics, marketing, and quality control to determine if a sample comes from a population with a specific distribution.

The ChiSquareGoodnessOfFitTest class

The chi-square goodness-of-fit test is implemented by the ChiSquareGoodnessOfFitTest class. This class provides multiple constructors to accommodate different types of input data.

Samples can be provided as a vector, a categorical vector, or as histogram data. The expected distribution can be provided as a histogram or a distribution. In the case of a distribution, you can supply the number of estimated parameters, which affects the degrees of freedom of the test.

For instance, you can use the constructor that takes a categorical vector and an expected vector to perform the test on categorical data. Alternatively, you can use the constructor that takes a sample vector and a distribution to test if the sample comes from a specified continuous or discrete distribution. These techniques are illustrated below.

Example 1 - Fitting a Discrete Distribution

In a gambling game, the payout is directly proportional to the number of sixes that are thrown. A very successful customer has the following results: She threw 3 sixes twice, 2 sixes eleven times, 1 six thirty-five times, and no sixes fifty-two times.

The casino management suspects that the customer may be using weighted dice. The significance level for this test is 0.01.

The number of sixes thrown follows a binomial distribution with $p = \frac{1}{6}$ . The expected values can be calculated easily using the GetExpectedHistogram method of the BinomialDistribution. We then compare the results to the actual:

var sixesDistribution = new BinomialDistribution(3, 1 / 6.0);
var expected = sixesDistribution.GetExpectedHistogram(100);
var actual = Vector.Create<double>(51, 35, 12, 2);
var chiSquare = new ChiSquareGoodnessOfFitTest(actual, expected);
chiSquare.SignificanceLevel = 0.01;
Console.WriteLine("Test statistic: {0:F4}", chiSquare.Statistic);
Console.WriteLine("P-value:        {0:F4}", chiSquare.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    chiSquare.Reject() ? "yes" : "no");

Visual Basic

Dim sixesDistribution = New BinomialDistribution(3, 1 / 6.0)
Dim expected = sixesDistribution.GetExpectedHistogram(100)
Dim actual = Vector.Create(Of Double)(51, 35, 12, 2)
Dim chiSquare = New ChiSquareGoodnessOfFitTest(actual, expected)
chiSquare.SignificanceLevel = 0.01
Console.WriteLine("Test statistic: {0:F4}", chiSquare.Statistic)
Console.WriteLine("P-value:        {0:F4}", chiSquare.PValue)
Console.WriteLine("Reject null hypothesis? {0}",
    IIf(chiSquare.Reject(), "yes", "no"))

Visual Basic

No code example is currently available or this language may not be supported.

let sixesDistribution = BinomialDistribution(3, 1.0 / 6.0)
let expected = sixesDistribution.GetExpectedHistogram(100.0)
let actual = Vector.Create(51., 35., 12., 2.)
let chiSquare = ChiSquareGoodnessOfFitTest(actual, expected)
chiSquare.SignificanceLevel <- 0.01
printfn "Test statistic: %.4f" chiSquare.Statistic
printfn "P-value:        %.4f" chiSquare.PValue
printfn "Reject null hypothesis? %s"
    (if chiSquare.Reject() then "yes" else "no")

The value of the chi-square statistic is 9.6013 giving a p-value of 0.0223. As a result, the hypothesis that the dice are weighted is rejected at the 0.01 level.

Example 2 - Fitting a Continuous Distribution

A store manager wants to check if the waiting times at the store follow a gamma distribution. The manager collected waiting times (in minutes) from a sample of 100 customers.

We put all the observations in an array and fit the Gamma distribution using Maximum Likelihood Estimation (MLE). We do this by calling the GammaDistribution constructor with our sample, and print the parameters of the fitted distribution.

We then compare the observed frequencies to the expected frequencies using the chi-square test. We don't have to worry about binning the data. All we need to do is supply the distribution and specify that we estimated 2 distribution parameters using the appropriate ChiSquareGoodnessOfFitTest constructor. We finally print the test statistic and p-value.

The complete code looks as follows:

double[] waitingTimes = new double[]
{
    12.9, 5.3, 17.6, 19.0, 5.7, 20.5, 23.7, 20.0, 16.5, 27.9,
    4.1, 18.4, 8.0, 8.6, 9.4, 11.6, 19.8, 40.6, 0.3, 7.3, 13.2,
    14.0, 5.6, 24.7, 32.0, 3.2, 27.6, 6.0, 6.7, 19.2, 12.0,
    16.5, 3.5, 8.8, 9.8, 11.0, 14.5, 6.4, 21.9, 16.2, 8.7,
    14.3, 5.8, 19.9, 13.0, 3.8, 5.5, 13.4, 21.5, 21.3, 8.4,
    4.8, 9.4, 12.7, 14.0, 5.0, 18.5, 10.1, 5.8, 11.0, 4.7,
    17.7, 2.4, 12.8, 7.5, 18.3, 7.0, 16.9, 19.8, 10.3, 7.3,
    19.6, 0.1, 15.4, 9.6, 4.3, 9.6, 6.1, 17.2, 14.3, 3.7, 2.3,
    21.7, 22.4, 4.1, 7.5, 5.2, 17.3, 7.1, 8.3, 23.6, 19.0,
    20.4, 8.6, 15.3, 12.1, 6.3, 10.1, 14.2, 4.8
};
// The manager suspects that the waiting times follow a gamma distribution.
// We fit the distribution using Maximum Likelihood Estimation:
var gamma = new GammaDistribution(waitingTimes);
Console.WriteLine("Estimated gamma distribution:");
Console.WriteLine($"  Shape: {gamma.ShapeParameter:F3}");
Console.WriteLine($"  Scale: {gamma.ScaleParameter:F3}");

// Perform the chi-square goodness-of-fit test. We specify that
// we estimated two parameters of the distribution.
var x2Test = new ChiSquareGoodnessOfFitTest(waitingTimes, gamma, 2);

Console.WriteLine($"Chi-Square Statistic: {x2Test.Statistic:F3}");
Console.WriteLine($"P-Value: {x2Test.PValue:F4}");
Console.WriteLine($"Reject? {(x2Test.Reject(0.05) ? "yes" : "no")}");

Visual Basic

Dim waitingTimes = {
    12.9, 5.3, 17.6, 19.0, 5.7, 20.5, 23.7, 20.0, 16.5, 27.9,
    4.1, 18.4, 8.0, 8.6, 9.4, 11.6, 19.8, 40.6, 0.3, 7.3, 13.2,
    14.0, 5.6, 24.7, 32.0, 3.2, 27.6, 6.0, 6.7, 19.2, 12.0,
    16.5, 3.5, 8.8, 9.8, 11.0, 14.5, 6.4, 21.9, 16.2, 8.7,
    14.3, 5.8, 19.9, 13.0, 3.8, 5.5, 13.4, 21.5, 21.3, 8.4,
    4.8, 9.4, 12.7, 14.0, 5.0, 18.5, 10.1, 5.8, 11.0, 4.7,
    17.7, 2.4, 12.8, 7.5, 18.3, 7.0, 16.9, 19.8, 10.3, 7.3,
    19.6, 0.1, 15.4, 9.6, 4.3, 9.6, 6.1, 17.2, 14.3, 3.7, 2.3,
    21.7, 22.4, 4.1, 7.5, 5.2, 17.3, 7.1, 8.3, 23.6, 19.0,
    20.4, 8.6, 15.3, 12.1, 6.3, 10.1, 14.2, 4.8 }
' The manager suspects that the waiting times follow a gamma distribution.
' We fit the distribution using Maximum Likelihood Estimation:
Dim gamma = new GammaDistribution(waitingTimes)
Console.WriteLine("Estimated gamma distribution:")
Console.WriteLine($"  Shape: {gamma.ShapeParameter:F3}")
Console.WriteLine($"  Scale: {gamma.ScaleParameter:F3}")

' Perform the chi-square goodness-of-fit test. We specify that
' we estimated two parameters of the distribution.
Dim x2Test = New ChiSquareGoodnessOfFitTest(waitingTimes, gamma, 2)

Console.WriteLine($"Chi-Square Statistic: {x2Test.Statistic:F3}")
Console.WriteLine($"P-Value: {x2Test.PValue:F4}")
Console.WriteLine($"Reject? {Iif(x2Test.Reject(0.05), "yes", "no")}")

Visual Basic

No code example is currently available or this language may not be supported.

//#region Gof1b
    let waitingTimes = [|
        12.9; 5.3; 17.6; 19.0; 5.7; 20.5; 23.7; 20.0; 16.5; 27.9;
        4.1; 18.4; 8.0; 8.6; 9.4; 11.6; 19.8; 40.6; 0.3; 7.3; 13.2;
        14.0; 5.6; 24.7; 32.0; 3.2; 27.6; 6.0; 6.7; 19.2; 12.0;
        16.5; 3.5; 8.8; 9.8; 11.0; 14.5; 6.4; 21.9; 16.2; 8.7;
        14.3; 5.8; 19.9; 13.0; 3.8; 5.5; 13.4; 21.5; 21.3; 8.4;
        4.8; 9.4; 12.7; 14.0; 5.0; 18.5; 10.1; 5.8; 11.0; 4.7;
        17.7; 2.4; 12.8; 7.5; 18.3; 7.0; 16.9; 19.8; 10.3; 7.3;
        19.6; 0.1; 15.4; 9.6; 4.3; 9.6; 6.1; 17.2; 14.3; 3.7; 2.3;
        21.7; 22.4; 4.1; 7.5; 5.2; 17.3; 7.1; 8.3; 23.6; 19.0;
        20.4; 8.6; 15.3; 12.1; 6.3; 10.1; 14.2; 4.8 |]
    // The manager suspects that the waiting times follow a gamma distribution.
    // We fit the distribution using Maximum Likelihood Estimation:
    let gamma = GammaDistribution(waitingTimes)
    printfn "Estimated gamma distribution:"
    printfn "  Shape: %.3f" gamma.ShapeParameter
    printfn "  Scale: %.3f" gamma.ScaleParameter

    // Perform the chi-square goodness-of-fit test. We specify that
    // we estimated two parameters of the distribution.
    let x2Test = ChiSquareGoodnessOfFitTest(waitingTimes, gamma, 2)
    printfn "Chi-Square Statistic: %.3f" x2Test.Statistic
    printfn "P-Value: %.4f" x2Test.PValue
    printfn "Reject? %s" (if x2Test.Reject(0.05) then "yes" else "no")
    //#region Gof2
    let weibull = WeibullDistribution(2.0, 1.0)
    let logNormal = LognormalDistribution(0.0, 1.0)
    let logNormalSample = logNormal.Sample(25)
    let ksTest = OneSampleKolmogorovSmirnovTest(logNormalSample, weibull)
    printfn "Test statistic: %.4f" ksTest.Statistic
    printfn "P-value:        %.4f" ksTest.PValue
    printfn "Reject null hypothesis? %s"
        (if ksTest.Reject() then "yes" else "no")
    //#endregion
    //#region Gof3
    let weibullSample = weibull.Sample(25)
    let ksTest2 = TwoSampleKolmogorovSmirnovTest(logNormalSample, weibullSample)
    printfn "Test statistic: %.4f" ksTest2.Statistic
    printfn "P-value:        %.4f" ksTest2.PValue
    printfn "Reject null hypothesis? %s"
        (if ksTest2.Reject() then "yes" else "no")
    //#endregion
    //#region Gof4
    let strength = Vector.Create(
                    18.830, 20.800, 21.657, 23.030, 23.230, 24.050,
                    24.321, 25.500, 25.520, 25.800, 26.690, 26.770,
                    26.780, 27.050, 27.670, 29.900, 31.110, 33.200,
                    33.730, 33.760, 33.890, 34.760, 35.750, 35.910,
                    36.980, 37.080, 37.090, 39.580, 44.045, 45.290,
                    45.381)
    let adTest = AndersonDarlingTest(strength, 30.81, 7.38)
    printfn "Test statistic: %.4f" adTest.Statistic
    printfn "P-value:        %.4f" adTest.PValue
    printfn "Reject null hypothesis? %s"
        (if adTest.Reject() then "yes" else "no")
    //#endregion
    //#region Gof5
    let swTest = ShapiroWilkTest(strength)
    printfn "Test statistic: %.4f" swTest.Statistic
    printfn "P-value:        %.4f" swTest.PValue
    printfn "Reject null hypothesis? %s"
        (if swTest.Reject() then "yes" else "no")
    //#endregion

let Heteroscedasticity =
    //#region Bartlett1
    let batch = Vector.FromFunction(100, fun i -> 1 + i / 10).AsCategorical()
    let diameter = 
        Vector.Create(
            1.006, 0.996, 0.998, 1.000, 0.992, 0.993, 1.002, 0.999, 0.994, 1.000,
            0.998, 1.006, 1.000, 1.002, 0.997, 0.998, 0.996, 1.000, 1.006, 0.988,
            0.991, 0.987, 0.997, 0.999, 0.995, 0.994, 1.000, 0.999, 0.996, 0.996,
            1.005, 1.002, 0.994, 1.000, 0.995, 0.994, 0.998, 0.996, 1.002, 0.996,
            0.998, 0.998, 0.982, 0.990, 1.002, 0.984, 0.996, 0.993, 0.980, 0.996,
            1.009, 1.013, 1.009, 0.997, 0.988, 1.002, 0.995, 0.998, 0.981, 0.996,
            0.990, 1.004, 0.996, 1.001, 0.998, 1.000, 1.018, 1.010, 0.996, 1.002,
            0.998, 1.000, 1.006, 1.000, 1.002, 0.996, 0.998, 0.996, 1.002, 1.006,
            1.002, 0.998, 0.996, 0.995, 0.996, 1.004, 1.004, 0.998, 0.999, 0.991,
            0.991, 0.995, 0.984, 0.994, 0.997, 0.997, 0.991, 0.998, 1.004, 0.997)
    let bartlett = BartlettTest(diameter, batch)
    let variables = diameter.SplitBy(batch).ToArray()
    let bartlett2 = BartlettTest(variables)
    //#endregion
    //#region Bartlett2
    printfn "Test statistic: %.4f" bartlett.Statistic
    printfn "P-value:        %.4f" bartlett.PValue
    printfn "Reject null hypothesis? %s"
        (if bartlett.Reject() then "yes" else "no")
    //#endregion
    //#region Bartlett3
    printfn "Critical value: %.4f at 95%%"
        (bartlett.GetUpperCriticalValue(0.05))
    printfn "Critical value: %.4f at 99%%"
        (bartlett.GetUpperCriticalValue(0.01))
    //#endregion

    //#region Levene1
    let levene = LeveneTest(diameter, batch)
    printfn "Test statistic: %.4f" levene.Statistic
    printfn "P-value:        %.4f" levene.PValue
    printfn "Reject null hypothesis? %s"
        (if levene.Reject() then "yes" else "no")
    //#endregion
    //#region Levene2
    printfn "Critical value: %.4f at 95%%"
        (levene.GetUpperCriticalValue(0.05))
    printfn "Critical value: %.4f at 90%%"
        (levene.GetUpperCriticalValue(0.1))
    //#endregion

    //#region Fligner1
    let fligner = FlignerKilleenTest(diameter, batch)
    printfn "Test statistic: %.4f" fligner.Statistic
    printfn "P-value:        %.4f" fligner.PValue
    printfn "Reject null hypothesis? %s"
        (if fligner.Reject() then "yes" else "no")
    //#endregion
    //#region Fligner2
    printfn "Critical value: %.4f at 95%%"
        (fligner.GetUpperCriticalValue(0.05))
    printfn "Critical value: %.4f at 90%%"
        (fligner.GetUpperCriticalValue(0.1))
    //#endregion
    0

let NonParametric =
    //#region MW1
    let males = Vector.Create(19, 22, 16, 29, 24)
    let females = Vector.Create(20, 11, 17, 12)
    let mwTest = MannWhitneyTest<int>(males, females)
    printfn "Test statistic: %.4f" mwTest.Statistic
    printfn "P-value:        %.4f" mwTest.PValue
    printfn "Reject null hypothesis? %s"
        (if mwTest.Reject() then "yes" else "no")
    //#endregion
    //#region KW1
    let method1 = Vector.Create(94., 87., 90., 74., 86., 97.)
    let method2 = Vector.Create(82., 85., 79., 84., 61., 72., 80.)
    let method3 = Vector.Create(89., 68., 72., 76., 69., 65.)
    let kwTest = KruskalWallisTest(method1, method2, method3)
    printfn "Test statistic: %.4f" kwTest.Statistic
    printfn "P-value:        %.4f" kwTest.PValue
    printfn "Reject null hypothesis? %s"
        (if kwTest.Reject() then "yes" else "no")
    //#endregion
    printfn "\nRuns Test\n"
    //#region Runs1
    let densities = 
        Vector.Create(
            5.36, 5.29, 5.58, 5.65, 5.57, 5.53, 5.62, 5.29, 
            5.44, 5.34, 5.79, 5.10, 5.27, 5.39, 5.42, 5.47, 
            5.63, 5.34, 5.46, 5.30, 5.75, 5.68, 5.85)
    let rt = RunsTest<double>(densities, densities.Median())
    printfn "Test statistic: %.4f" rt.Statistic
    printfn "P-value:        %.4f" rt.PValue
    printfn "Reject null hypothesis? %s" 
        (if rt.Reject() then "yes" else "no")
    //#endregion

let Outliers =
    //#region Grubbs
    let grubbsSample = Vector.Create([| 199.31; 199.53; 200.19; 
                        200.82; 201.92; 201.95; 202.18; 245.57 |])
    let grubbs = new GrubbsTest(grubbsSample, HypothesisType.OneTailedUpper)
    printfn "Grubbs G:     %.5f" grubbs.Statistic
    printfn "       Crit.: %.5f" (grubbs.GetUpperCriticalValue())
    printfn "       Reject: %A" (grubbs.Reject())
    //#endregion
    //#region gESD1
    let sample = Vector.Create(
                    [|
                     -0.25; 0.68; 0.94; 1.15; 1.20; 1.26; 1.26; 1.34;
                     1.38; 1.43; 1.49; 1.49; 1.55; 1.56; 1.58; 1.65;
                     1.69; 1.70; 1.76; 1.77; 1.81; 1.91; 1.94; 1.96;
                     1.99; 2.06; 2.09; 2.10; 2.14; 2.15; 2.23; 2.24;
                     2.26; 2.35; 2.37; 2.40; 2.47; 2.54; 2.62; 2.64;
                     2.90; 2.92; 2.92; 2.93; 3.21; 3.26; 3.30; 3.59;
                     3.68; 4.30; 4.64; 5.34; 5.42; 6.01 
                    |])
    let test = new GeneralizedEsdTest(sample, 10, HypothesisType.TwoTailed)
    //#endregion
    //#region gESD2
    printfn "Number of outliers: %d" test.NumberOfOutliers
    let outliers = sample[test.GetOutlierIndexes()]
    printfn "Outliers: %O" outliers
    //#endregion
    //#region gESD3
    let test4 = test.GetTest(4)
    printfn "%s" (test4.Summarize())
    //#endregion
    0

let Contingencies =
    //#region CT1
    // Create categorical vectors for rows and columns  
    let rowVariable1 = Vector.CreateCategorical([| "A"; "B"; "A"; "B" |])
    let columnVariable1 = Vector.CreateCategorical([| "X"; "X"; "Y"; "Y" |])

    // Create a contingency table  
    let table1 = new ContingencyTable(rowVariable1, columnVariable1);
    //#endregion

    //#region CT2
    // Create categorical vectors for rows and columns  
    let rowVariable2 = Vector.CreateCategorical([| "A"; "B"; "A"; "B" |])
    let columnVariable2 = Vector.CreateCategorical([| "X"; "X"; "Y"; "Y" |])

    // Create a count vector  
    let countVariable = Vector.Create([| 1.0; 2.0; 3.0; 4.0 |])

    // Create a contingency table  
    let table2 = new ContingencyTable(rowVariable2, columnVariable2, countVariable)
    //#endregion

    //#region CT3
    // Create a matrix for counts  
    let counts1 = Matrix.Create(array2D [ [ 1.0; 2.0 ]; [ 3.0; 4.0 ] ])

    // Create row and column scales  
    let rowIndex = Numerics.NET.DataAnalysis.Index.Create([| "A"; "B" |])
    let columnIndex = Numerics.NET.DataAnalysis.Index.Create([| "X"; "Y" |])

    // Create a contingency table  
    let table3 = new ContingencyTable(counts1, rowIndex, columnIndex);
    //#endregion

    //#region CTProps
    // Access properties
    let  chiSquare = table2.ChiSquare
    let  phi = table2.Phi
    let  coefficientOfContingency = table2.CoefficientOfContingency
    let  cramerV = table2.CramerV

    // Display results
    printfn "Chi-Square: %f" chiSquare
    printfn "Phi: %f" phi
    printfn "Coefficient of Contingency: %f" coefficientOfContingency
    printfn "Cramer's V: %f" cramerV
    //#endregion

    //#region CTIndex
    // Using numerical index:
    let cell01 = table2[0, 1]
    // Using category names:
    let cell10 = table2["A", "X"]
    // Total for row 0:
    let cell0_ = table2[0, table2.ColumnCount]
    // Total for column 1:
    let cell_1 = table2[table2.RowCount, 1]
    // Total for the table:
    let cell__ = table2[table2.RowCount, table2.ColumnCount]

    // Display results

    // Display results
    printfn "Cell (0, 1):"
    printfn "  Expected: %f" cell01.ExpectedCount
    printfn "  Actual:   %f" cell01.Count
    printfn "  Relative frequency:"
    printfn "    In row:   %f" cell01.RelativeFrequencyInRow
    printfn "    In column:%f" cell01.RelativeFrequencyInColumn
    printfn "    In table: %f" cell01.RelativeFrequency
    //#endregion

    //#region CTTests
    let chiSquareTest = table2.GetChiSquareTest()
    printfn "Chi-Square Test: %s" (chiSquareTest.Summarize())

    let yatesCorrectedChiSquareTest = table2.GetYatesCorrectedChiSquareTest()
    printfn "Yates Corrected Chi-Square Test: %s" (yatesCorrectedChiSquareTest.Summarize())

    let likelihoodRatioTest = table2.GetLikelihoodRatioTest()
    printfn "Likelihood Ratio Test: %s" (likelihoodRatioTest.Summarize())

    let mantelHaenszelTest = table2.GetMantelHaenszelTest()
    printfn "Mantel-Haenszel Test: %s" (mantelHaenszelTest.Summarize())

    let fisherExactProbability = table2.GetFisherExactProbability()
    printfn "Fisher Exact Probability: %f" fisherExactProbability

The fitted gamma distribution has shape parameter 2.582 and scale parameter 3.705. Running the Chi square test gives a test statistic of 7.414 with a corresponding p-value of 0.1917. The manager concludes that he can't reject the hypothesis that the waiting times follow a gamma distribution at the 0.05 level.

The One Sample Kolmogorov-Smirnov Test

The one sample Kolmogorov-Smirnov test (KS test) is used to test the hypothesis that a given sample was taken from a proposed continuous distribution.

Definition

The null hypothesis ( $H_{0}$ ) is that the sample comes from the proposed continuous distribution. Formally, $H_{0} : F (x) = F_{0} (x)$ for all $x$ , where $F (x)$ is the cumulative distribution function (CDF) of the sample and $F_{0} (x)$ is the CDF of the proposed distribution.

The test statistic ( $D$ ) is defined as the maximum absolute difference between the empirical distribution function (EDF) of the sample and the CDF of the proposed distribution:

D = sup_{x} | F_{n} (x) - F_{0} (x) |

where $F_{n} (x)$ is the EDF of the sample, $F_{0} (x)$ is the CDF of the proposed distribution, and $sup$ denotes the supremum.

The distribution of the test statistic $D$ under the null hypothesis is known as the Kolmogorov distribution. The critical values for the test are obtained from this distribution.

Assumptions

The KS test can be applied to any continuous distribution. However, it can't be applied to discrete distributions and is more sensitive near the center of the distribution than at the tails. The distribution must be completely specified. If one or more of the distribution's parameters is estimated, the distribution of the test statistic is different from the Kolmogorov-Smirnov distribution.

Applications

The KS test is used in various fields such as finance, biology, and engineering to test if a sample comes from a specific continuous distribution.

The OneSampleKolmogorovSmirnovTest class

The one sample Kolmogorov-Smirnov test is implemented by the OneSampleKolmogorovSmirnovTest class. It has three constructors. The first constructor takes no arguments. All test parameters must be specified through properties of the test object. The second constructor takes two arguments. The first is a Vector<T> object that specifies the sample. The second is a Func<T, TResult> delegate, which specifies the cumulative distribution function of the distribution being tested. The third constructor also takes two arguments. The first argument is once again vector. The second argument must be of a type derived from ContinuousDistribution.

Example

In this example, we take samples of a lognormal distribution, and test whether it could come from a similar looking Weibull distribution.

var weibull = new WeibullDistribution(2, 1);
var logNormal = new LognormalDistribution(0, 1);
var logNormalSample = logNormal.Sample(25);
var ksTest = new OneSampleKolmogorovSmirnovTest(logNormalSample, weibull);
Console.WriteLine("Test statistic: {0:F4}", ksTest.Statistic);
Console.WriteLine("P-value:        {0:F4}", ksTest.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    ksTest.Reject() ? "yes" : "no");

Visual Basic

Dim weibull = New WeibullDistribution(2, 1)
Dim logNormal = New LognormalDistribution(0, 1)
Dim logNormalSample = logNormal.Sample(25)
Dim ksTest = New OneSampleKolmogorovSmirnovTest(logNormalSample, weibull)
Console.WriteLine("Test statistic: {0:F4}", ksTest.Statistic)
Console.WriteLine("P-value:        {0:F4}", ksTest.PValue)
Console.WriteLine("Reject null hypothesis? {0}",
    IIf(ksTest.Reject(), "yes", "no"))

Visual Basic

No code example is currently available or this language may not be supported.

let weibull = WeibullDistribution(2.0, 1.0)
let logNormal = LognormalDistribution(0.0, 1.0)
let logNormalSample = logNormal.Sample(25)
let ksTest = OneSampleKolmogorovSmirnovTest(logNormalSample, weibull)
printfn "Test statistic: %.4f" ksTest.Statistic
printfn "P-value:        %.4f" ksTest.PValue
printfn "Reject null hypothesis? %s"
    (if ksTest.Reject() then "yes" else "no")

First we create a Weibull and a lognormal distribution. We then get 25 random samples from the lognormal distribution using its Sample(Int32) method.

Because we use random samples, the results of the test are different on each run. The trend is that the p-value is anywhere from 0.03 to 0.3. We can conclude from this that it is not possible to distinguish a lognormal distribution from a Weibull distribution using only 25 sample points.

The Two Sample Kolmogorov-Smirnov Test

The two sample Kolmogorov-Smirnov test is used to test the hypothesis that two samples come from a population with the same, unknown distribution.

Definition

The null hypothesis ( $H_{0}$ ) is that the two samples come from the same underlying distribution. Formally, $H_{0} : F_{1} (x) = F_{2} (x)$ for all $x$ , where $F_{1} (x)$ and $F_{2} (x)$ are the cumulative distribution functions (CDFs) of the two samples.

The test statistic ( $D$ ) is defined as the maximum absolute difference between the empirical distribution functions (EDFs) of the two samples:

D = sup_{x} | F_{n 1} (x) - F_{n 2} (x) |

where $F_{n 1} (x)$ and $F_{n 2} (x)$ are the EDFs of the two samples, and $sup$ denotes the supremum.

The distribution of the test statistic $D$ under the null hypothesis is known as the Kolmogorov distribution. The critical values for the test are obtained from this distribution.

Assumptions

The two sample KS test assumes that the samples are independent and that the distributions are continuous.

Applications

The two sample KS test is used in various fields such as medicine, economics, and environmental science to compare two samples and determine if they come from the same distribution.

The TwoSampleKolmogorovSmirnovTest class

The two sample Kolmogorov-Smirnov test is implemented by the TwoSampleKolmogorovSmirnovTest class. It has two constructors. The first constructor takes no arguments. The second constructor takes two arguments. Both are vectors that specify the two samples that are being compared.

Example

We investigate whether we can distinguish a sample taken from a lognormal distribution from a sample taken from a similar looking Weibull distribution. We use the lognormal samples we created in the previous section.

var weibullSample = weibull.Sample(25);
var ksTest2 = new TwoSampleKolmogorovSmirnovTest(logNormalSample, weibullSample);
Console.WriteLine("Test statistic: {0:F4}", ksTest2.Statistic);
Console.WriteLine("P-value:        {0:F4}", ksTest2.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    ksTest2.Reject() ? "yes" : "no");

Visual Basic

Dim weibullSample = weibull.Sample(25)
Dim ksTest2 = New TwoSampleKolmogorovSmirnovTest(logNormalSample, weibullSample)
Console.WriteLine("Test statistic: {0:F4}", ksTest2.Statistic)
Console.WriteLine("P-value:        {0:F4}", ksTest2.PValue)
Console.WriteLine("Reject null hypothesis? {0}",
    IIf(ksTest2.Reject(), "yes", "no"))

Visual Basic

No code example is currently available or this language may not be supported.

let weibullSample = weibull.Sample(25)
let ksTest2 = TwoSampleKolmogorovSmirnovTest(logNormalSample, weibullSample)
printfn "Test statistic: %.4f" ksTest2.Statistic
printfn "P-value:        %.4f" ksTest2.PValue
printfn "Reject null hypothesis? %s"
    (if ksTest2.Reject() then "yes" else "no")

The Anderson-Darling Test for Normality

The Anderson-Darling test is a one sample test of normality. It is a variation of the Kolmogorov-Smirnov test that assigns more weight to the tails of the distribution.

Definition

The null hypothesis ( $H_{0}$ ) is that the sample comes from a normal distribution. Formally, $H_{0} : F (x) = Φ (\frac{x - μ}{σ})$ for all $x$ , where $F (x)$ is the cumulative distribution function (CDF) of the sample, and $Φ (x)$ is the CDF of the standard normal distribution.

The test statistic ( $A^{2}$ ) is defined as:

A^{2} = - n - \frac{1}{n} \sum_{i = 1}^{n} (2 i - 1) [\ln F (Y_{i}) + \ln (1 - F (Y_{n - i + 1}))]

where $n$ is the sample size, $Y_{i}$ are the ordered sample values, and $F$ is the cumulative distribution function of the normal distribution with estimated parameters.

The distribution of $A^{2}$ depends on the sample size. Critical values are typically obtained from tables or approximated using modified statistics. Smaller values of $A^{2}$ support the null hypothesis of normality.

Assumptions

The Anderson-Darling test assumes that the sample is randomly selected from the population. Unlike the Kolmogorov-Smirnov test, the distribution of the test statistic is dependent on the distribution. The parameters of the distribution are estimated from the sample.

While the Anderson-Darling test can theoretically be modified to test for other distributions besides normal, this is generally discouraged for several reasons:

The critical values of the test statistic depend on the specific distribution being tested, and these values are not readily available for many distributions.
When parameters of the distribution must be estimated from the data, the distribution of the test statistic becomes even more complex and specific to each case.

For testing goodness-of-fit to non-normal distributions, it is recommended to use either:

The Kolmogorov-Smirnov test, which works with any continuous distribution and has a distribution-free test statistic (though it is less powerful in the tails), or
The chi-square test, which can be used with any distribution (continuous or discrete) by binning the data appropriately.

Applications

The Anderson-Darling test is used in various fields such as manufacturing, finance, and environmental science to test if a sample comes from a normal distribution.

The AndersonDarlingTest class

The Anderson-Darling test is implemented by the AndersonDarlingTest class. It has three constructors. The first constructor has no arguments. The second constructor has one argument: a vector that specifies the sample to be tested. The third constructor takes three arguments. The first is once again a vector that specifies the sample. The second and third arguments are the mean and standard deviation of the normal distribution being tested. If no values are provided, the values are estimated from the sample.

Example

We investigate the strength of polished airplane windows. We want to verify that the measured strengths follow a normal distribution. We have a total of 31 samples.

var strength = Vector.CopyFrom(
    18.830, 20.800, 21.657, 23.030, 23.230, 24.050,
    24.321, 25.500, 25.520, 25.800, 26.690, 26.770,
    26.780, 27.050, 27.670, 29.900, 31.110, 33.200,
    33.730, 33.760, 33.890, 34.760, 35.750, 35.910,
    36.980, 37.080, 37.090, 39.580, 44.045, 45.290,
    45.381);
var adTest = new AndersonDarlingTest(strength, 30.81, 7.38);
Console.WriteLine("Test statistic: {0:F4}", adTest.Statistic);
Console.WriteLine("P-value:        {0:F4}", adTest.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    adTest.Reject() ? "yes" : "no");

Visual Basic

Dim strength = Vector.Create(
    18.83, 20.8, 21.657, 23.03, 23.23, 24.05,
    24.321, 25.5, 25.52, 25.8, 26.69, 26.77,
    26.78, 27.05, 27.67, 29.9, 31.11, 33.2,
    33.73, 33.76, 33.89, 34.76, 35.75, 35.91,
    36.98, 37.08, 37.09, 39.58, 44.045, 45.29,
    45.381)
Dim adTest = New AndersonDarlingTest(strength, 30.81, 7.38)
Console.WriteLine("Test statistic: {0:F4}", adTest.Statistic)
Console.WriteLine("P-value:        {0:F4}", adTest.PValue)
Console.WriteLine("Reject null hypothesis? {0}",
    IIf(adTest.Reject(), "yes", "no"))

Visual Basic

No code example is currently available or this language may not be supported.

let strength = Vector.Create(
                18.830, 20.800, 21.657, 23.030, 23.230, 24.050,
                24.321, 25.500, 25.520, 25.800, 26.690, 26.770,
                26.780, 27.050, 27.670, 29.900, 31.110, 33.200,
                33.730, 33.760, 33.890, 34.760, 35.750, 35.910,
                36.980, 37.080, 37.090, 39.580, 44.045, 45.290,
                45.381)
let adTest = AndersonDarlingTest(strength, 30.81, 7.38)
printfn "Test statistic: %.4f" adTest.Statistic
printfn "P-value:        %.4f" adTest.PValue
printfn "Reject null hypothesis? %s"
    (if adTest.Reject() then "yes" else "no")

The value of the Anderson-Darling statistic is 0.5128, corresponding to a p-value of 0.1795. We conclude that the window strengths do follow a normal distribution.

The Shapiro-Wilk Test for Normality

The Shapiro-Wilk test is a one sample test of normality, also known as the Shapiro-Wilk W test. It is generally considered more reliable than the Anderson-Darling or Kolmogorov-Smirnov test. It is valid for sample sizes between 3 and 5000.

Definition

The null hypothesis ( $H_{0}$ ) is that the sample comes from a normally distributed population. Formally, $H_{0} : F (x) = Φ (\frac{x - μ}{σ})$ for all $x$ , where $F (x)$ is the cumulative distribution function (CDF) of the sample, $Φ (x)$ is the CDF of the standard normal distribution, $μ$ is the mean, and $σ$ is the standard deviation.

The test statistic ( $W$ ) is calculated based on the ordered sample values and their expected values under normality. Specifically, it is defined as:

W = \frac{{(\sum_{i = 1}^{n} a_{i} Y_{(i)})}^{2}}{\sum_{i = 1}^{n} (Y_{i} - \bar{Y})^{2}}

where $n$ is the sample size, $Y_{(i)}$ are the ordered sample values, $\bar{Y}$ is the sample mean, and $a_{i}$ are constants derived from the expected values of the order statistics of a standard normal distribution.

The distribution of the test statistic $W$ under the null hypothesis is determined by the sample size. Critical values for the test are obtained from tables or approximated using simulations. Smaller values of $W$ indicate a departure from normality.

Assumptions

The Shapiro-Wilk test assumes that the sample is randomly selected from the population. Unlike the Kolmogorov-Smirnov test, the distribution of the test statistic is dependent on the distribution. The parameters of the distribution are estimated from the sample.

Applications

The Shapiro-Wilk test is used in various fields such as manufacturing, finance, and environmental science to test if a sample comes from a normal distribution. It is particularly useful when the sample size is small to moderate.

The ShapiroWilkTest class

The Shapiro-Wilk test is implemented by the ShapiroWilkTest class. It has two constructors. The first constructor has no arguments. The second constructor has one argument: a vector that specifies the sample to be tested.

Example

As above, we investigate the strength of polished airplane windows. We want to verify that the measured strengths follow a normal distribution. We have a total of 31 samples.

var swTest = new ShapiroWilkTest(strength);
Console.WriteLine("Test statistic: {0:F4}", swTest.Statistic);
Console.WriteLine("P-value:        {0:F4}", swTest.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    swTest.Reject() ? "yes" : "no");

Visual Basic

Dim swTest = New ShapiroWilkTest(strength)
Console.WriteLine("Test statistic: {0:F4}", swTest.Statistic)
Console.WriteLine("P-value:        {0:F4}", swTest.PValue)
Console.WriteLine("Reject null hypothesis? {0}",
    IIf(swTest.Reject(), "yes", "no"))

Visual Basic

No code example is currently available or this language may not be supported.

let swTest = ShapiroWilkTest(strength)
printfn "Test statistic: %.4f" swTest.Statistic
printfn "P-value:        %.4f" swTest.PValue
printfn "Reject null hypothesis? %s"
    (if swTest.Reject() then "yes" else "no")

The value of the Shapiro-Wilk statistic is 0.9511, corresponding to a p-value of 0.1675. We conclude that the window strengths do follow a normal distribution.

References

Agresti, A. (2007). An Introduction to Categorical Data Analysis. Wiley.
Cochran, W. G. (1952). The Chi-Square Test of Goodness of Fit. The Annals of Mathematical Statistics, 23(3), 315-345.
Massey, F. J. (1951). The Kolmogorov-Smirnov Test for Goodness of Fit. Journal of the American Statistical Association, 46(253), 68-78.
Smirnov, N. V. (1948). Table for Estimating the Goodness of Fit of Empirical Distributions. The Annals of Mathematical Statistics, 19(2), 279-281.
Fasano, G., & Franceschini, A. (1987). A Multidimensional Version of the Kolmogorov-Smirnov Test. Monthly Notices of the Royal Astronomical Society, 225(1), 155-170.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1992). Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press.
Stephens, M. A. (1974). EDF Statistics for Goodness of Fit and Some Comparisons. Journal of the American Statistical Association, 69(347), 730-737.
D'Agostino, R. B., & Stephens, M. A. (1986). Goodness-of-Fit Techniques. Marcel Dekker.