One-Way ANOVA

The simplest ANOVA design considers one factor only. This is called one-way ANOVA. The one-way analysis of variance is implemented by the OneWayAnovaModel class.

Constructing One-Way ANOVA Models

The OneWayAnovaModel class has two constructors. The first constructor takes two arguments: a numerical Vector<T> that specifies the dependent variable, and a CategoricalVector<T> that specifies the independent variable. The two variables must have the same number of observations.

As an example, we construct an ANOVA model for measurements of the resistance of silicon wafers using 5 different instruments. The factor is the instrument. The numerical value is the resistance.

C#
var instrument = Vector.Create(25, i => 1 + i / 5).AsCategorical();
var resistance = Vector.Create(new double[] {
    196.3052, 196.1240, 196.1890, 196.2569, 196.3403,
    196.3042, 196.3825, 196.1669, 196.3257, 196.0422,
    196.1303, 196.2005, 196.2889, 196.0343, 196.1811,
    196.2795, 196.1748, 196.1494, 196.1485, 195.9885,
    196.2119, 196.1051, 196.1850, 196.0052, 196.2090});
var anova1 = new OneWayAnovaModel(resistance, instrument);

The second constructor takes three arguments. The first argument is a IDataFrame (a DataFrame<R, C> or Matrix<T>) that contains the variables you wish to use in the analysis. The second argument is the name of the dependent variable in the data frame. The third argument is the name of the independent variable in the data frame. Using the variables we created in the previous example, we get:

C#
var dataFrame = DataFrame.FromColumns(new Dictionary<string, object>()
{ {"instrument", instrument }, {"resistance", resistance } });
var anova2 = new OneWayAnovaModel(dataFrame, "resistance", "instrument");

Performing the analysis

The results of the analysis can be obtained through the model's AnovaTable property. The ANOVA table for a one-way design has three rows. These are commonly labeled Between Groups, Within Groups, and Total.

The crucial data is provided by the Between Groups row, which shows the contribution to the total variation of the factor under consideration. It is the first row in the ANOVA table, and therefore has index 0. It can be retrieved through the GetModelRow method. Since the entire model of a one-way analysis of variance consists of the contribution of one factor, this property is also available as the CompleteModelRow property.

The AnovaModelRow obtained in this way shows the results of the test for significance of the variation due to the factor compared to the variation not explained by the factor. The FStatistic property gives the value of the F statistic for this ratio, while the PValue gives the actual significance of the F statistic.

The Within Groups row shows the variation of the data around the group means. It corresponds to the error or residual of the variation in the data after the model has been taken into account. The row is available through the ANOVA table's TotalRow property.

The Total row contains the summary data for the entire data set. It can be retrieved through the TotalRow property of the ANOVA table.

The example below illustrates these properties:

C#
anova1.Fit();
var anovaTable = anova1.AnovaTable;
Console.WriteLine("F statistic: {0}", anovaTable.CompleteModelRow.FStatistic);
Console.WriteLine("P-value     : {0}", anovaTable.CompleteModelRow.PValue);
Console.WriteLine("Sum of sq. total: {0}",
    anovaTable.TotalRow.SumOfSquares);
Console.WriteLine("Sum of sq. error: {0}",
    anovaTable.ErrorRow.SumOfSquares);
Console.WriteLine("Sum of sq. model: {0}",
    anovaTable.CompleteModelRow.SumOfSquares);
Console.WriteLine(anovaTable.ToString());

For the example using the silicon wafers, we find that the F statistic is 1.1804, corresponding to a p-value of ???.

The group means can be accessed through the model's Cells property. In the example below, we first obtain the category index for the factor. We then iterate through the levels and print the group means:

C#
var factor = anova1.GetFactor(0);
for (int i = 0; i < factor.Length; i++)
    Console.WriteLine("Mean for group '{0}': {1:F4}",
        factor[i], anova1.Cells[i].Mean);

The TotalCell property returns the cell with totals for the complete data. The grand mean can be obtained from this cell:

C#
Console.WriteLine("Grand mean: {0:F4}", anova1.TotalCell.Mean);