Testing Homogeneity of Variances

One of the assumptions underlying Analysis of Variance is that the variances across groups are identical. This property is called homogeneity of variances or heteroscedasticity. It is often desirable to verify this assumption using an appropriate hypothesis test. The three most common ones are Bartlett's test, Levene's test, and the Fligner-Killeen test.

Bartlett's test is highly sensitive to departures from normality, making it less robust in the presence of non-normal data. Levene's test is less sensitive to non-normality and is often preferred when the data may not be normally distributed. The Fligner-Killeen test is the most robust among the three, being the least influenced by departures from normality, making it a preferred choice in many cases.

Bartlett's Test

Bartlett's test, also known as Bartlett's homogeneity of variances test, is used to determine if multiple samples have equal variances. It is particularly sensitive to departures from normality.

Definition

The null hypothesis (H0) for Bartlett's test is that the variances of all groups are equal. The alternative hypothesis (H1) is that at least one of the variances is different. The test statistic used is a chi-square statistic calculated as follows:

χ2=(Nk)ln(Sp2)i=1k(ni1)ln(Si2)1+13(k1)(i=1k1ni11Nk)

where N is the total number of observations, k is the number of groups, Sp2 is the pooled variance, and Si2 is the variance of group i. The test statistic follows a chi-square distribution with k1 degrees of freedom.

Assumptions

Bartlett's test assumes that the samples are normally distributed. It is sensitive to violations of this assumption, meaning it cannot adequately distinguish between a violation of homogeneity of variances and a violation of the normality assumption. In contrast, Levene's test is less sensitive to departures from normality and is often preferred in such cases.

Applications

Bartlett's test is used in various fields such as quality control, manufacturing, and research to verify the assumption of equal variances across different groups or batches. This is crucial for the validity of many statistical methods, including Analysis of Variance (ANOVA).

The BartlettTest Class

Bartlett's test is implemented by the BartlettTest class. It has three constructors. The first constructor takes no arguments. The data and conditions for the test must be specified by setting properties of the BartlettTest object. The second constructor takes an array of Vector<T> objects, that contain the samples the test is to be applied to. The third constructor takes two arguments. The first is a vector containing the data for all the samples. The second argument is a IGrouping object (such as a CategoricalVector<T>) that specifies how the values in the first argument are to be grouped.

Example

We start with a collection of measurements of gear diameters from 10 batches. We want to verify that the variances of the diameters for the batches are equal. The data comes in two variables: one numerical vector with the measured diameters and one categorical that specifies the corresponding batch. If the batch vector is categorical, it can be used directly to group the diameters. Alternatively, we can split the diameter vector according to the batch:

C#
var batch = Vector.CreateFromFunction(100, i => 1 + i / 10).AsCategorical();
var diameter = Vector.Create(
    1.006, 0.996, 0.998, 1.000, 0.992, 0.993, 1.002, 0.999, 0.994, 1.000,
    0.998, 1.006, 1.000, 1.002, 0.997, 0.998, 0.996, 1.000, 1.006, 0.988,
    0.991, 0.987, 0.997, 0.999, 0.995, 0.994, 1.000, 0.999, 0.996, 0.996,
    1.005, 1.002, 0.994, 1.000, 0.995, 0.994, 0.998, 0.996, 1.002, 0.996,
    0.998, 0.998, 0.982, 0.990, 1.002, 0.984, 0.996, 0.993, 0.980, 0.996,
    1.009, 1.013, 1.009, 0.997, 0.988, 1.002, 0.995, 0.998, 0.981, 0.996,
    0.990, 1.004, 0.996, 1.001, 0.998, 1.000, 1.018, 1.010, 0.996, 1.002,
    0.998, 1.000, 1.006, 1.000, 1.002, 0.996, 0.998, 0.996, 1.002, 1.006,
    1.002, 0.998, 0.996, 0.995, 0.996, 1.004, 1.004, 0.998, 0.999, 0.991,
    0.991, 0.995, 0.984, 0.994, 0.997, 0.997, 0.991, 0.998, 1.004, 0.997);
BartlettTest bartlett = new BartlettTest(diameter, batch);
var variables = diameter.SplitBy(batch).ToArray();
BartlettTest bartlett2 = new BartlettTest(variables);

We can then run the test:

C#
Console.WriteLine("Test statistic: {0:F4}", bartlett.Statistic);
Console.WriteLine("P-value:        {0:F4}", bartlett.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    bartlett.Reject() ? "yes" : "no");

The value of the chi-square statistic is 20.7859 giving a p-value of 0.0136. As a result, the hypothesis that the variances are equal is rejected at the 0.05 level.

Once a BartlettTest object has been created, you can access other properties and methods common to all hypothesis test classes. For instance, to obtain the critical values for a significance level of 0.01 and 0.05, the code would be:

C#
Console.WriteLine("Critical value: {0:F4} at 95%",
    bartlett.GetUpperCriticalValue(0.05));
Console.WriteLine("Critical value: {0:F4} at 99%",
    bartlett.GetUpperCriticalValue(0.01));

The values of the critical values (16.9190 at 0.05 and 21.6660 at 0.01) show that the null hypothesis will be rejected at the 0.05 level.

Levene's Test

Levene's test, also known as Levene's homogeneity of variances test, is a robust statistical test used to assess the equality of variances for a variable calculated for two or more groups. It is less sensitive to departures from normality compared to Bartlett's test, making it a preferred choice in many cases.

Definition

The null hypothesis (H0) for Levene's test is that the variances of all groups are equal. The alternative hypothesis (H1) is that at least one of the variances is different. The test statistic used is an F statistic, which is calculated based on the absolute deviations from the group median, mean, or trimmed mean.

Assumptions

Levene's test assumes that the samples are independent and that the data is continuous. It is less sensitive to departures from normality compared to Bartlett's test, making it more robust in the presence of non-normal data.

Applications

Levene's test is used in various fields such as quality control, manufacturing, and research to verify the assumption of equal variances across different groups or batches. This is crucial for the validity of many statistical methods, including Analysis of Variance (ANOVA).

The LeveneTest Class

Levene's test is implemented by the LeveneTest class. It has five constructors. The first constructor takes no arguments. The data and conditions for the test must be specified by setting properties of the LeveneTest object. The second constructor takes an array of Vector<T> objects that contain the samples the test is to be applied to. The third constructor takes one additional argument: a LocationMeasure value that specifies which measure of location to use in the calculation of the test statistic. This value can also be accessed and set through the LocationMeasure property.

The fourth constructor takes two arguments. The first is a vector containing the data for all the samples. The second argument is a IGrouping object (such as a CategoricalVector<T>) that specifies how the values in the first argument are to be grouped. The fifth constructor is like the fourth but takes an additional argument to specify the measure of location.

Example

We start from the same data as before: a collection of measurements of gear diameters from 10 batches. We want to verify that the variances of the diameters for the batches are equal. See the example with Bartlett's test for an illustration of how to prepare the data.

Here, we show how to create the LeveneTest object, and run the test:

C#
var levene = new LeveneTest(diameter, batch);
Console.WriteLine("Test statistic: {0:F4}", levene.Statistic);
Console.WriteLine("P-value:        {0:F4}", levene.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    levene.Reject() ? "yes" : "no");

The value of the F statistic is 1.7059 giving a p-value of 0.0991. As a result, the hypothesis that the variances are equal is not rejected at the 0.05 level.

The outcome of Levene's test is clearly different from that of Bartlett's test for the same data. The reason is most likely that the data are not distributed normally. Bartlett's test cannot distinguish non-homogeneity from departure from normality.

Once a LeveneTest object has been created, you can access other properties and methods common to all hypothesis test classes. For instance, to obtain the critical values for a significance level of 0.05 and 0.1, the code would be:

C#
Console.WriteLine("Critical value: {0:F4} at 95%",
    levene.GetUpperCriticalValue(0.05));
Console.WriteLine("Critical value: {0:F4} at 90%",
    levene.GetUpperCriticalValue(0.1));

The values of the critical values (1.9856 at 0.05 and 1.7021 at 0.01) show that the null hypothesis will not be rejected at the 0.05 level.

Fligner-Killeen Test

The Fligner-Killeen test, also known as the Fligner test, is a robust statistical test used to assess the homogeneity of variances across multiple groups. It is less influenced by departures from normality compared to Bartlett's test, making it a preferred choice in many cases.

Definition

The null hypothesis (H0) for the Fligner-Killeen test is that the variances of all groups are equal. The alternative hypothesis (H1) is that at least one of the variances is different. The test statistic used is an F statistic, which is calculated based on the absolute deviations from the group median, mean, or trimmed mean.

Assumptions

The Fligner-Killeen test assumes that the samples are independent and that the data is continuous. It is less sensitive to departures from normality compared to Bartlett's test, making it more robust in the presence of non-normal data.

Applications

The Fligner-Killeen test is used in various fields such as quality control, manufacturing, and research to verify the assumption of equal variances across different groups or batches. This is crucial for the validity of many statistical methods, including Analysis of Variance (ANOVA).

The FlignerKilleenTest Class

The Fligner-Killeen test is implemented by the FlignerKilleenTest class. It has five constructors. The first constructor takes no arguments. The data and conditions for the test must be specified by setting properties of the LeveneTest object. The second constructor takes an array of Vector<T> objects that contain the samples the test is to be applied to. The third constructor takes one additional argument: a LocationMeasure value that specifies which measure of location to use in the calculation of the test statistic. This value can also be accessed and set through the LocationMeasure property.

The fourth constructor takes two arguments. The first is a vector containing the data for all the samples. The second argument is a IGrouping object (such as a CategoricalVector<T>) that specifies how the values in the first argument are to be grouped. The fifth constructor is like the fourth but takes an additional argument to specify the measure of location.

Example

We start from the same data as before: a collection of measurements of gear diameters from 10 batches. We want to verify that the variances of the diameters for the batches are equal. See the example with Bartlett's test for an illustration of how to prepare the data.

Here, we show how to create the LeveneTest object, and run the test:

C#
var fligner = new FlignerKilleenTest(diameter, batch);
Console.WriteLine("Test statistic: {0:F4}", fligner.Statistic);
Console.WriteLine("P-value:        {0:F4}", fligner.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    fligner.Reject() ? "yes" : "no");

The value of the F statistic is 1.7059 giving a p-value of 0.0991. As a result, the hypothesis that the variances are equal is not rejected at the 0.05 level.

The outcome of The Fligner-Killeen test is clearly different from that of Bartlett's test for the same data. The reason is most likely that the data are not distributed normally. Bartlett's test cannot distinguish non-homogeneity from departure from normality.

Once a FlignerKilleenTest object has been created, you can access other properties and methods common to all hypothesis test classes. For instance, to obtain the critical values for a significance level of 0.05 and 0.1, the code would be:

C#
Console.WriteLine("Critical value: {0:F4} at 95%",
    fligner.GetUpperCriticalValue(0.05));
Console.WriteLine("Critical value: {0:F4} at 90%",
    fligner.GetUpperCriticalValue(0.1));

The values of the critical values (1.9856 at 0.05 and 1.7021 at 0.01) show that the null hypothesis will not be rejected at the 0.05 level.

References

  • Bartlett, M. S. (1937). Properties of Sufficiency and Statistical Tests. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, Vol. 160, No.901, pp. 268-282.

  • Levene, H. (1960). In Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, I. Olkin et al. eds., Stanford University Press, pp. 278-292.

  • Fligner, M.A. and Killeen, T.J. (1976). Distribution-free two-sample tests for scale. Journal of the American Statistical Association. 71(353), 210-213.