Homogeneity Of Variances Tests in IronPython QuickStart Sample

Illustrates how to test a collection of variables for equal variances using classes in the Numerics.NET.Statistics.Tests namespace in IronPython.

This sample is also available in: C#, Visual Basic, F#.

Overview

This QuickStart sample demonstrates how to test whether multiple groups of data have equal variances using both Bartlett’s test and Levene’s test implemented in Numerics.NET.

The sample uses a practical example of quality control in manufacturing, analyzing measurements of gear diameters from 10 different production batches. It shows how to:

Set up data for homogeneity of variance tests
Perform Bartlett’s test for equal variances
Perform Levene’s test using different measures of location (mean, median, trimmed mean)
Interpret test results including test statistics, p-values and critical values
Compare the characteristics of both tests

The example illustrates an important application in quality control and ANOVA, where equal variances across groups is a key assumption. It demonstrates both the faster but more restrictive Bartlett’s test and the more robust but computationally intensive Levene’s test. The sample includes detailed comments explaining the advantages and limitations of each approach.

The code

import numerics

from Extreme.Mathematics import *

from Extreme.Statistics import *
from Extreme.Statistics.Tests import *

# Illustrates how to perform a goodness of fit test
# using the classes in the Extreme.Statistics.Tests
# namespace.

# One of the underlying assumptions of Analysis of Variance
# (ANOVA) is that the variances in the different groups are
# identical. This QuickStart Sample shows how to use
# the two tests are available that can verify this assumption.

# The data for this QuickStart Sample is measurements of
# the diameters of gears from 10 different batches.
# Two variables are provided:

# batchVariable contains the batch number of each measurement:
batchVariable = CategoricalVariable("batch", [ \
	1,1,1,1,1,1,1,1,1,1, \
    2,2,2,2,2,2,2,2,2,2, \
    3,3,3,3,3,3,3,3,3,3, \
    4,4,4,4,4,4,4,4,4,4, \
    5,5,5,5,5,5,5,5,5,5, \
    6,6,6,6,6,6,6,6,6,6, \
    7,7,7,7,7,7,7,7,7,7, \
    8,8,8,8,8,8,8,8,8,8, \
    9,9,9,9,9,9,9,9,9,9, \
    10,10,10,10,10,10,10,10,10,10 ])

# diameterVariable contains the actual measurements:
diameterVariable = NumericalVariable("diameter", Vector([ \
	1.006, 0.996, 0.998, 1.000, 0.992, 0.993, 1.002, \
    0.999, 0.994, 1.000, 0.998, 1.006, 1.000, 1.002, \
    0.997, 0.998, 0.996, 1.000, 1.006, 0.988, 0.991, \
    0.987, 0.997, 0.999, 0.995, 0.994, 1.000, 0.999, \
    0.996, 0.996, 1.005, 1.002, 0.994, 1.000, 0.995, \
    0.994, 0.998, 0.996, 1.002, 0.996, 0.998, 0.998, \
    0.982, 0.990, 1.002, 0.984, 0.996, 0.993, 0.980, \
    0.996, 1.009, 1.013, 1.009, 0.997, 0.988, 1.002, \
    0.995, 0.998, 0.981, 0.996, 0.990, 1.004, 0.996, \
    1.001, 0.998, 1.000, 1.018, 1.010, 0.996, 1.002, \
    0.998, 1.000, 1.006, 1.000, 1.002, 0.996, 0.998, \
    0.996, 1.002, 1.006, 1.002, 0.998, 0.996, 0.995, \
    0.996, 1.004, 1.004, 0.998, 0.999, 0.991, 0.991, \
    0.995, 0.984, 0.994, 0.997, 0.997, 0.991, 0.998, \
    1.004, 0.997 ]))

# To prepare the data, we first create a CellArray made up
# of the two variables:
cells = CellArray(diameterVariable, batchVariable)
# We then use the GetCellVariables method to obtain 
# individual variables for each value  of the categorical
# variable:
variables = cells.GetCellVariables()

#
# Bartlett's test
#

# Bartlett's test is relatively fast, but has the drawback that 
# it requires the data in the groups to be normally distributed, # and it is not very robust against departures from normality.
# What this means in practice is that the test can't distinguish
# between rejection because of non-homogeneity of variances
# and violation of the normality assumption.

print "Bartlett's test."
			
# We pass the array of variables to the constructor:
bartlett = BartlettTest(variables)

# We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the Probability property:
print "Test statistic: {0:.4f}".format(bartlett.Statistic)
print "P-value:        {0:.4f}".format(bartlett.PValue)

print "Critical value: {0:.4f} at 90%".format(bartlett.GetUpperCriticalValue(0.10))
print "Critical value: {0:.4f} at 95%".format(bartlett.GetUpperCriticalValue(0.05))
print "Critical value: {0:.4f} at 99%".format(bartlett.GetUpperCriticalValue(0.01))

# We can now print the test results:
print "Reject null hypothesis?", "yes" if bartlett.Reject() else "no"

#
# Levene's Test
#

# Levene's test is slower than Bartlett's test, but is generally more reliable.
# It comes in three variants, depending on the measure of location used.
# The default is that the group median is used.

print "\nLevene's Test"

# Once again, we pass an array of Variable objects to the constructor.
# The LeveneTest constructor is overloaded: you can specify
# the type of mean (mean, median, or trimmed mean):
levene = LeveneTest(variables, LeveneTestLocationMeasure.Median)

# We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the Probability property:
print "Test statistic: {0:.4f}".format(levene.Statistic)
print "P-value:        {0:.4f}".format(levene.PValue)

# We can obtain critical values for various significance levels:
print "Critical value: {0:.4f} at 90%".format(levene.GetUpperCriticalValue(0.10))
print "Critical value: {0:.4f} at 95%".format(levene.GetUpperCriticalValue(0.05))
print "Critical value: {0:.4f} at 99%".format(levene.GetUpperCriticalValue(0.01))

# We can now print the test results:
print "Reject null hypothesis?", "yes" if levene.Reject() else "no"