Contingency Tables

Contingency tables, also known as cross-tabulation tables, are used to display the frequency distribution of variables. They help in understanding the relationship between two categorical variables by showing the counts of their combinations. Applications include statistical analysis, hypothesis testing, and data visualization in fields like epidemiology, market research, and social sciences.

In Numerics.NET, contingency tables are implemented by the ContingencyTable class. It supports both 2x2 and RxC tables. It provides properties to access computed statistics and methods to perform various hypothesis tests.

Constructing Contingency Tables

The ContingencyTable class can be created using several constructors that allow you to specify the row and column variables, and optionally, the count variable.

The first constructor allows you to create a new contingency table for the specified row and column variables. This is useful when you have categorical data for rows and columns and want to analyze their relationship.

C#
// Create categorical vectors for rows and columns  
var rowVariable1 = Vector.CreateCategorical(new[] { "A", "B", "A", "B" });
var columnVariable1 = Vector.CreateCategorical(new[] { "X", "X", "Y", "Y" });

// Create a contingency table  
var table1 = new ContingencyTable(rowVariable1, columnVariable1);

The second constructor allows you to create a new contingency table for the specified row and column variables, and the count variable. This is useful when you have additional count data that you want to include in the analysis.

C#
// Create categorical vectors for rows and columns  
var rowVariable2 = Vector.CreateCategorical(new[] { "A", "B", "A", "B" });
var columnVariable2 = Vector.CreateCategorical(new[] { "X", "X", "Y", "Y" });

// Create a count vector  
var countVariable = Vector.Create(new double[] { 1, 2, 3, 4 });

// Create a contingency table  
var table2 = new ContingencyTable(rowVariable2, columnVariable2, countVariable);

The third and fourth constructors allow you to create a new contingency table from counts stored in a matrix. You can optionally specify the row and column indexes.

C#
// Create a matrix for counts  
Matrix<double> counts1 = Matrix.Create(new double[,] { { 1, 2 }, { 3, 4 } });

// Create row and column scales  
var rowIndex = Numerics.NET.DataAnalysis.Index.Create(new[] { "A", "B" });
var columnIndex = Numerics.NET.DataAnalysis.Index.Create(new[] { "X", "Y" });

// Create a contingency table  
ContingencyTable table3 = new ContingencyTable(counts1, rowIndex, columnIndex);

Properties

The ContingencyTable class provides several properties to access computed statistics and other information about the contingency table structure and its measures of association.

The structural properties include RowCount and ColumnCount, which return the number of categories in the first and second variables respectively, while TotalCount returns the total sample size as the sum of all frequencies.

For measuring associations between variables, the class provides several statistical properties: ChiSquare returns the overall association measure, Phi provides a coefficient specifically for 2x2 tables, CoefficientOfContingency returns Pearson's measure of association strength, and CramerV offers a measure suitable for tables of any size.

The following code examples demonstrate how to access these properties and interpret their values in practice. Note that some measures of association are only meaningful for specific table dimensions.

C#
// Access properties
double chiSquare = table2.ChiSquare;
double phi = table2.Phi;
double coefficientOfContingency = table2.CoefficientOfContingency;
double cramerV = table2.CramerV;

// Display results
Console.WriteLine($"Chi-Square: {chiSquare}");
Console.WriteLine($"Phi: {phi}");
Console.WriteLine($"Coefficient of Contingency: {coefficientOfContingency}");
Console.WriteLine($"Cramer's V: {cramerV}");

Accessing Cells and Totals

The ContingencyTable class defines indexers that can be used to access the cell frequencies and totals in the table. There are two overloads of the indexer: one that takes the (numeric) row and column indexes, starting at 0, and another that takes the row and column category values. The row and column totals are simply extra rows and columns in the table. To access row or column totals, set the index of the row or column to 1 past the last legal index.

The indexers return a structure of type ContingencyTableCell, which has the following properties:

Property

Description

RowLevel

The level (category) of the row containing the cell.

ColumnLevel

The level (category) of the column containing the cell.

Count

The number of observations in the cell.

RelativeFrequency

The frequency of the cell relative to all other cells in the contingency table.

RelativeFrequencyInRow

The frequency of the cell relative to the other cells in the same row of the contingency table.

RelativeFrequencyInColumn

The frequency of the cell relative to the other cells in the same column of the contingency table.

RelativePercentage

The relative percentage of the cell in the ContingencyTableCell.

RelativePercentageInRow

The percentage of the cell relative to the other cells in the same row of the contingency table.

RelativePercentageInColumn

The percentage of the cell relative to the other cells in the same column of the contingency table.

ExpectedCount

The expected count of the cell based on the row and column totals of the contingency table.

ChiSquare

The contribution of the cell to the Chi-square statistic of the contingency table.

Residual

The difference between the expected and the actual count in the cell of the contingency table.

StandardizedResidual

The standardized difference between the expected and the actual count in the cell of the contingency table.

AdjustedStandardizedResidual

The standardized difference between the expected and the actual count in the cell of the contingency table, adjusted for the row and column totals.

The code below demonstrates how to access the cell frequencies and totals using these indexers:

C#
// Using numerical index:
var cell01 = table2[0, 1];
// Using category names:
var cell10 = table2["A", "X"];
// Total for row 0:
var cell0_ = table2[0, table2.ColumnCount];
// Total for column 1:
var cell_1 = table2[table2.RowCount, 1];
// Total for the table:
var cell__ = table2[table2.RowCount, table2.ColumnCount];

// Display results
Console.WriteLine($"Cell (0, 1):");
Console.WriteLine($"  Expected: {cell01.ExpectedCount}");
Console.WriteLine($"  Actual:   {cell01.Count}");
Console.WriteLine($"  Relative frequency:");
Console.WriteLine($"    In row:   {cell01.RelativeFrequencyInRow}");
Console.WriteLine($"    In column:{cell01.RelativeFrequencyInColumn}");
Console.WriteLine($"    In table: {cell01.RelativeFrequency}");

Hypothesis Tests

The ContingencyTable class provides several methods to compute various statistics and perform hypothesis tests. These tests help in determining the statistical significance of the observed relationships in the contingency table.

The Chi-Square test is used to determine if there is a significant association between the row and column variables. It compares the observed frequencies in the contingency table to the expected frequencies if the variables were independent. This test is performed by the GetChiSquareTest method.

The Yates-corrected Chi-Square test is a modification of the Chi-Square test that adjusts for continuity. This test is particularly useful for small sample sizes to reduce the bias in the test statistic. This test is performed by the GetYatesCorrectedChiSquareTest method.

The likelihood ratio test is another method to test the independence of the row and column variables. It compares the likelihood of the observed data under the null hypothesis (independence) to the likelihood under the alternative hypothesis (dependence). This test is performed by the GetLikelihoodRatioTest method.

The Mantel-Haenszel Chi-Square test is used to assess the association between two categorical variables while controlling for one or more other variables. It is commonly used in stratified analysis. This test is performed by the GetMantelHaenszelTest method.

Fisher's exact test is used to determine if there are nonrandom associations between two categorical variables. This test is particularly useful for small sample sizes and 2x2 tables. This test is performed by the GetFisherExactProbability() method.

C#
var chiSquareTest = table2.GetChiSquareTest();
Console.WriteLine($"Chi-Square Test: {chiSquareTest.Summarize()}");

var yatesCorrectedChiSquareTest = table2.GetYatesCorrectedChiSquareTest();
Console.WriteLine($"Yates Corrected Chi-Square Test: {yatesCorrectedChiSquareTest.Summarize()}");

var likelihoodRatioTest = table2.GetLikelihoodRatioTest();
Console.WriteLine($"Likelihood Ratio Test: {likelihoodRatioTest.Summarize()}");

var mantelHaenszelTest = table2.GetMantelHaenszelTest();
Console.WriteLine($"Mantel-Haenszel Test: {mantelHaenszelTest.Summarize()}");

double fisherExactProbability = table2.GetFisherExactProbability();
Console.WriteLine($"Fisher Exact Probability: {fisherExactProbability}");

Related Topics

  • Chi-Square Distribution

  • Simple Hypothesis Test

  • Fisher's Exact Test