Two-Way ANOVA

An ANOVA design with two factors is called a two-way analysis of variance. The two-way analysis of variance is implemented by the TwoWayAnovaModel class.

Constructing Two-Way ANOVA Models

The TwoWayAnovaModel class has two constructors. The first constructor takes three arguments: a Vector<T> that specifies the dependent variable, and two CategoricalVector<T> objects that specify the independent variables. All three variables must have the same number of observations.

As an example, we construct an ANOVA model to measure the effect of package color and shape on sales. Our data comes from 12 stores. The two categorical variables are the shape color and the shape. The dependent variable is the total sales of the product in the store.

C#
var colors = Vector.CreateCategorical(new[] {
    "Blue", "Blue", "Blue", "Blue",
    "Red", "Red", "Red", "Red",
    "Green", "Green", "Green", "Green" });
var shapes = Vector.CreateCategorical(new[] {
    "Square", "Square", "Rectangle", "Rectangle",
    "Square", "Square", "Rectangle", "Rectangle",
    "Square", "Square", "Rectangle", "Rectangle" });
var sales = Vector.Create(new[] {
    6.0, 14.0, 19.0, 17.0,
    18.0, 11.0, 20.0, 23.0,
    7.0, 11.0, 18.0, 10.0});
var anova1 = new TwoWayAnovaModel(sales, colors, shapes);

The second constructor takes four arguments. The first argument is a DataFrame<R, C> that contains the variables you wish to use in the analysis. The second argument is the name of the dependent variable in the data frame. The third and fourth arguments are the names of the independent variable in the data frame. Using the variables we created in the previous example, we get:

C#
var dataFrame = DataFrame.FromColumns(
    new IVector[] { colors, shapes, sales },
    Index.Create(new[] { "color", "shape", "sales" }));
var anova2 = new TwoWayAnovaModel(dataFrame, "sales", "color", "shape");

Performing the analysis

The Compute() method performs the actual calculation.

The ANOVA table

The results of the analysis can be obtained through the model's AnovaTable property. The ANOVA table for a two-way design has five rows. There are three rows that describe the contribution of the model to the variation. There is one row for each of the factors and one row for the interaction between the two factors. These can be retrieved through the GetModelRow method. As always, there is one row for the residuals, and one for the complete model. Index 0 gives the row for the first factor, index 1 gives the row for the second factor, and index 3 gives the row for the interaction.

The CompleteModelRow property is not part of the ANOVA table. It shows the contribution of the complete model to the variation.

The AnovaModelRow objects obtained in this way show the results of the test for significance of the variation due to the factor compared to the variation not explained by the model. The FStatistic property gives the value of the F statistic for this ratio, while the PValue gives the significance of the F statistic.

The Within Groups row shows the variation of the data around the group means. It corresponds to the error or residual of the variation in the data after the model has been taken into account. The row is available through the ANOVA table's TotalRow property.

The Total row contains the summary data for the entire data set. It can be retrieved through the TotalRow property of the ANOVA table.

The example below illustrates these properties:

C#
anova1.Fit();
var anovaTable = anova1.AnovaTable;
Console.WriteLine("F statistic: {0}", anovaTable.CompleteModelRow.FStatistic);
Console.WriteLine("P-value     : {0}", anovaTable.CompleteModelRow.PValue);
Console.WriteLine("Sum of sq. total: {0}",
    anovaTable.TotalRow.SumOfSquares);
Console.WriteLine("Sum of sq. error: {0}",
    anovaTable.ErrorRow.SumOfSquares);
Console.WriteLine("Sum of sq. color: {0}",
    anovaTable.GetModelRow(0).SumOfSquares);
Console.WriteLine("Sum of sq. shape: {0}",
    anovaTable.GetModelRow(1).SumOfSquares);
Console.WriteLine("Sum of sq. interaction: {0}",
    anovaTable.GetModelRow(2).SumOfSquares);
Console.WriteLine(anovaTable.ToString());

For the example using the packaging, we find that the F statistic for the color is 2.5049, corresponding to a p-value of 0.1619. For the shape, we find an F statistic of 7.7670 with a p-value of 0.0317, and for the interaction, we find an F statistic of 0.1359 with a p-value of 0.8755. The conclusion is that the color of the packaging does not contribute significantly to the sales of the product, but the impact of the shape is significant at the usual 0.05 level.

Type I, Type II, and Type III Sums of Squares

By default, the ANOVA table contains values computed with Type I sums of squares (also called sequential sums of squares). ANOVA calculations using other types of sums of squares. The type can be selected by setting the SumsOfSquaresType property before the model is calculated. This is a SumsOfSquaresType enumeration value.

Even if the type was not set beforehand, all 3 types are available after the calculation through three properties: TypeISumsOfSquares, TypeIISumsOfSquares, and TypeIIISumsOfSquares. These properties return an AnovaTable object with three rows, one each for the first factor, the second factor, and their interaction.

C#
var typeIII = anova1.TypeIIISumsOfSquares;
Console.WriteLine("Type III sums of squares:");
Console.WriteLine(typeIII);

Other properties

The group means can be accessed through the model's Cells property, which is a matrix of Cell objects. In the example below, we first obtain the CategoryIndex for the color variable. We then iterate through the levels of the index, and print the group means for the square boxes:

C#
var colorFactor = colors.CategoryIndex;
var squareColumn = anova1.Cells.GetColumn("Square");
foreach (var level in colorFactor)
    Console.WriteLine("Mean for square boxes group '{0}': {1:F4}",
        level, squareColumn.Get(level).Mean);

The RowTotals property returns a vector of cells with totals for each row (colors in our example). The ColumnTotals property returns a vector of cells with totals for entire columns (shapes in our example). Below, we print the total variance for all rectangular packages:

C#
Console.WriteLine("Variance of square packages: {0:F4}",
    anova1.ColumnTotals.Get("Rectangle").Variance);

The TotalCell property returns the cell with totals for the complete data. The grand mean can be obtained from this cell:

C#
Console.WriteLine("Grand mean: {0:F4}", anova1.TotalCell.Mean);