Testing Means

There are two common tests of the hypothesis that a sample comes from a distribution with a specified mean. One test, the one sample z test, is used when the standard deviation or the variance of the population is known. The other, the one sample t test, is used when the variance of the population is not known. The t test also has a two sample version, which tests whether the difference between the means of two samples is equal to a given value.

The One Sample z Test

The one sample z test, also known as the z-test for means, is used to test the hypothesis that a sample comes from a population with a specified mean when the variance or standard deviation is known.

Definition

The null hypothesis ( $H_{0}$ ) is that the population underlying the sample has a mean ( $μ$ ) equal to the proposed mean ( $μ_{0}$ ). The test statistic is the z-score, calculated as:

$z = \frac{\bar{x} - μ_{0}}{σ / \sqrt{n}}$

where $\bar{x}$ is the sample mean, $σ$ is the population standard deviation, and $n$ is the sample size. The z-score follows a standard normal distribution ( $N (0, 1)$ ).

Assumptions

The one sample z test assumes that the sample is randomly selected from the population and that the population itself follows a normal distribution. If either of these assumptions is violated, the reliability of the z test may be compromised.

Applications

The one sample z test is used in various fields to test hypotheses about population means when the population variance is known. It is commonly used in quality control, medical research, and social sciences.

The OneSampleZTest class

The one sample z test is implemented by the OneSampleZTest class. It has four constructors in all, which can be grouped in two pairs.

The first two constructors take 4 or 5 arguments. The first two arguments are the sample mean and the sample size. The next two arguments are the population mean and the population standard deviation. If present, the fifth argument is a HypothesisType value that specifies whether the test is one or two-tailed. The default value is TwoTailed.

The second pair of constructors take 3 or 4 arguments. The first argument is a Vector<T> that contains the sample data. The next two arguments are once again the population mean and standard deviation. The fourth argument, if present, is a HypothesisType value that specifies whether the test is one or two-tailed. The default value is HypothesisType.TwoTailed.

Example

The test scores of a class on a national test are as follows:

61, 77, 61, 90, 72, 51, 75, 83, 53, 82, 82, 66, 68, 57, 61, 61, 78, 69, 65.

We want to investigate if the mean of this class is significantly different from the national average, 79.3. The standard deviation is 7.3. The following code performs the test:

var results = Vector.Create<double>(62, 77, 61, 94, 75, 82, 86,
    83, 64, 84, 68, 82, 72, 71, 85, 66, 61, 79, 81, 73);
var zTest = HypothesisTests.ZTest(results, 79.3, 7.3);
Console.WriteLine("Test statistic: {0:F4}", zTest.Statistic);
Console.WriteLine("P-value:         {0:F4}", zTest.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    zTest.Reject() ? "yes" : "no");

Visual Basic

Dim results = Vector.Create(Of Double)(62, 77, 61, 94, 75, 82, 86,
    83, 64, 84, 68, 82, 72, 71, 85, 66, 61, 79, 81, 73)
Dim zTest = HypothesisTests.ZTest(results, 79.3, 7.3)
Console.WriteLine("Test statistic: {0:F4}", zTest.Statistic)
Console.WriteLine("P-value:         {0:F4}", zTest.PValue)
Console.WriteLine("Reject null hypothesis? {0}",
    IIf(zTest.Reject(), "yes", "no"))

Visual Basic

No code example is currently available or this language may not be supported.

let results = Vector.Create(
                62., 77., 61., 94., 75., 82., 86.,
                83., 64., 84., 68., 82., 72., 71., 
                85., 66., 61., 79., 81., 73.)
let zTest = HypothesisTests.ZTest(results, 79.3, 7.3)
printfn "Test statistic: %.4f" zTest.Statistic
printfn "P-value:         %.4f" zTest.PValue
printfn "Reject null hypothesis? %s" 
    (if zTest.Reject() then "yes" else "no")

The value of the z-statistic turns out to be -2.4505 giving a p-value of 0.0143. As a result, the hypothesis that on average, the students in this class score no different than the national average is rejected at the 0.05 level.

Using pre-calculated values for the mean and sample size, the above example would look like this:

double mean = results.Mean();
int sampleSize = results.Length;
zTest = new OneSampleZTest(mean, sampleSize, 79.3, 7.3);

Visual Basic

Dim mean = results.Mean()
Dim sampleSize = results.Length
zTest = New OneSampleZTest(mean, sampleSize, 79.3, 7.3)

Visual Basic

No code example is currently available or this language may not be supported.

let mean = results.Mean()
let sampleSize = results.Length
let zTest2 = OneSampleZTest(mean, sampleSize, 79.3, 7.3)

Once a OneSampleZTest object has been created, you can access other properties and methods common to all hypothesis test classes. For instance, to obtain a 95% confidence interval around the mean, the code would be:

var meanInterval = zTest.GetConfidenceInterval();
Console.WriteLine("95% Confidence interval for the mean: {0:F1} - {1:F1}",
    meanInterval.LowerBound, meanInterval.UpperBound);

Visual Basic

Dim meanInterval = zTest.GetConfidenceInterval()
Console.WriteLine("95% Confidence interval for the mean: {0:F1} - {1:F1}",
    meanInterval.LowerBound, meanInterval.UpperBound)

Visual Basic

No code example is currently available or this language may not be supported.

let meanInterval = zTest.GetConfidenceInterval()
printfn "95%% Confidence interval for the mean: %.1f - %.1f"
    meanInterval.LowerBound meanInterval.UpperBound

The confidence interval for the mean is 72.1 and 78.5 at the 95% confidence level.

The One Sample t Test

The one sample t test, also known as the t-test for means, is used to test the hypothesis that a sample comes from a population with a specified mean when the variance or standard deviation is not known.

Definition

The null hypothesis ( $H_{0}$ ) is that the population underlying the sample has a mean ( $μ$ ) equal to the proposed mean ( $μ_{0}$ ). The test statistic is the t-score, calculated as:

$t = \frac{\bar{x} - μ_{0}}{s / \sqrt{n}}$

where $\bar{x}$ is the sample mean, $s$ is the sample standard deviation, and $n$ is the sample size. The t-score follows a t-distribution with $n - 1$ degrees of freedom.

Assumptions

The one sample t test assumes that the sample is randomly selected from the population and that the population itself follows a normal distribution. If either of these assumptions is violated, the reliability of the t test may be compromised.

Applications

The one sample t test is used in various fields to test hypotheses about population means when the population variance is not known. It is commonly used in quality control, medical research, and social sciences.

The OneSampleTTest class

The one sample t test is implemented by the OneSampleTTest class. It has five constructors in all. The first constructor takes no arguments. The source data must be specified by setting properties of the object.

The remaining four can be grouped in two pairs. The first two constructors take 3 or 4 arguments. The first two arguments are the sample mean and the sample size. The next argument is the population mean. If present, the fourth argument is a HypothesisType value that specifies whether the test is one or two-tailed. The default value is HypothesisType.TwoTailed.

The second pair of constructors take 2 or 3 arguments. The first argument is a Vector<T> that contains the sample data. The next argument is once again the population mean. The third argument, if present, is a HypothesisType value that specifies whether the test is one or two-tailed. The default value is HypothesisType.TwoTailed.

Example

We use the same data as in the earlier example for the one sample z test, but this time we assume the standard deviation of the population is not known.

var results = Vector.Create<double>(62, 77, 61, 94, 75, 82, 86,
    83, 64, 84, 68, 82, 72, 71, 85, 66, 61, 79, 81, 73);
var tTest = HypothesisTests.TTest(results, 79.3);
Console.WriteLine("Test statistic: {0:F4}", tTest.Statistic);
Console.WriteLine("P-value:         {0:F4}", tTest.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    tTest.Reject() ? "yes" : "no");

Visual Basic

Dim results = Vector.Create(Of Double)(62, 77, 61, 94, 75, 82, 86,
    83, 64, 84, 68, 82, 72, 71, 85, 66, 61, 79, 81, 73)
Dim tTest = HypothesisTests.TTest(results, 79.3)
Console.WriteLine("Test statistic: {0:F4}", tTest.Statistic)
Console.WriteLine("P-value:         {0:F4}", tTest.PValue)
Console.WriteLine("Reject null hypothesis? {0}",
    IIf(tTest.Reject(), "yes", "no"))

Visual Basic

No code example is currently available or this language may not be supported.

let results = Vector.Create(
                62., 77., 61., 94., 75., 82., 86.,
                83., 64., 84., 68., 82., 72., 71.,
                85., 66., 61., 79., 81., 73.)
let tTest = HypothesisTests.TTest(results, 79.3)
printfn "Test statistic: %.4f" tTest.Statistic
printfn "P-value:        %.4f" tTest.PValue
printfn "Reject null hypothesis? %s"
    (if tTest.Reject() then "yes" else "no")

The value of the t-statistic is -1.8800 giving a p-value of 0.0755. As a result, the hypothesis that on average, the students in this class score no different than the national average is not rejected at the 0.05 level.

The one-sample t test can also be performed using only the mean and the size of the sample. The corresponding code for the above example would look like this:

double mean = results.Mean();
double standardDeviation = results.StandardDeviation();
int sampleSize = results.Length;
tTest = HypothesisTests.TTest(mean, standardDeviation, sampleSize, 79.3);

Visual Basic

Dim mean = results.Mean()
Dim standardDeviation = results.StandardDeviation()
Dim sampleSize = results.Length
tTest = HypothesisTests.TTest(mean, standardDeviation, sampleSize, 79.3)

Visual Basic

No code example is currently available or this language may not be supported.

let mean = results.Mean()
let standardDeviation = results.StandardDeviation()
let sampleSize = results.Length
let tTest_ = HypothesisTests.TTest(mean, standardDeviation, sampleSize, 79.3)

Once a OneSampleTTest object has been created, you can access other properties and methods common to all hypothesis test classes. For instance, to obtain a 95% confidence interval around the mean, the code would be:

var results = Vector.Create<double>(62, 77, 61, 94, 75, 82, 86,
    83, 64, 84, 68, 82, 72, 71, 85, 66, 61, 79, 81, 73);
var tTest = HypothesisTests.TTest(results, 79.3);
Console.WriteLine("Test statistic: {0:F4}", tTest.Statistic);
Console.WriteLine("P-value:         {0:F4}", tTest.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    tTest.Reject() ? "yes" : "no");

Visual Basic

Dim results = Vector.Create(Of Double)(62, 77, 61, 94, 75, 82, 86,
    83, 64, 84, 68, 82, 72, 71, 85, 66, 61, 79, 81, 73)
Dim tTest = HypothesisTests.TTest(results, 79.3)
Console.WriteLine("Test statistic: {0:F4}", tTest.Statistic)
Console.WriteLine("P-value:         {0:F4}", tTest.PValue)
Console.WriteLine("Reject null hypothesis? {0}",
    IIf(tTest.Reject(), "yes", "no"))

Visual Basic

No code example is currently available or this language may not be supported.

let results = Vector.Create(
                62., 77., 61., 94., 75., 82., 86.,
                83., 64., 84., 68., 82., 72., 71.,
                85., 66., 61., 79., 81., 73.)
let tTest = HypothesisTests.TTest(results, 79.3)
printfn "Test statistic: %.4f" tTest.Statistic
printfn "P-value:        %.4f" tTest.PValue
printfn "Reject null hypothesis? %s"
    (if tTest.Reject() then "yes" else "no")

Note that this interval (70.8-79.8) is wider than for the one-sample z test. The reason is that the uncertainty in the standard deviation of the population causes an increase in the uncertainty in the mean.

The Two Sample t Test

The two sample t test, also known as the independent t-test, is used to test the hypothesis that two samples are drawn from populations with the same mean.

Definition

The null hypothesis ( $H_{0}$ ) is that the difference between the means of the populations from which the samples were taken is equal to a specific value, which may be zero. The test statistic is the t-score, calculated as:

$t = \frac{{\bar{x}}_{1} - {\bar{x}}_{2} - Δ}{s_{p} \sqrt{\frac{1}{n_{1}} + \frac{1}{n_{2}}}}$

where ${\bar{x}}_{1}$ and ${\bar{x}}_{2}$ are the sample means, $Δ$ is the hypothesized difference between the population means, $s_{p}$ is the pooled standard deviation, and $n_{1}$ and $n_{2}$ are the sample sizes. The t-score follows a t-distribution with $n_{1} + n_{2} - 2$ degrees of freedom.

Assumptions

The two sample t test assumes that the samples are randomly selected from the populations, and that the populations themselves follow a normal distribution. A third assumption states that the variances of the populations underlying each of the samples are equal. If any of these three assumptions is violated, the reliability of the t test may be compromised.

Applications

The two sample t test is used in various fields to compare the means of two independent samples. It is commonly used in medical research, social sciences, and quality control.

The TwoSampleTTest class

The two sample t test is implemented by the TwoSampleTTest class. There are five constructors in all, reflecting the different variations of the test.

The first constructor takes no arguments. All test parameters must be provided by setting the properties of the TwoSampleTTest object.

The first two arguments of each constructor are Vector<T> objects that represent the samples the test is to be applied to. The first constructor only has these two arguments. This creates an unpaired test for equality of the means. The variances are estimated from the sample data. The second constructor takes a third parameter that specifies the proposed difference between the two means. This value is positive if the mean of the first sample is greater than the mean of the second sample. If omitted, the difference is taken to be zero.

The third and fourth constructors are similar to the first two, but take two additional parameters. The first additional parameter is a SamplePairing value that specifies whether the test is paired or unpaired. A value of SamplePairing.Paired produces a paired test. A value of SamplePairing.Unpaired produces an unpaired test.

The second additional argument is only meaningful for unpaired tests. It is a Boolean value that specifies whether it should be assumed that the variances of the two samples are equal.

Once the test has been performed, several properties become availble. If the test was based on samples, the Mean1 and Variance1 give the estimated mean and variance of the first sample, and Mean2 and Variance2 give the estimated mean and variance of the second sample.

The GetDifferenceEstimate() method returns an estimate of the difference between the means of the two samples. the value is a Parameter<T> whose Value property gives the estimated difference between the means. To get a confidence interval, you can call the GetConfidenceInterval(Double). Note that the t test object also has a GetConfidenceInterval(), but that returns the interval in terms of the test statistic.

Example of an unpaired test

Once again, we use the same data as before. However, this time we compare the results of one group of students to the results of a second group of students, with these test scores:

61, 80, 98, 90, 94, 65, 79, 75, 74, 86, 76, 85, 78, 72, 76, 79, 65, 92, 76, 80

The code below performs the unpaired two-sample t-test:

var results2 = Vector.Create<double>(61, 80, 98, 90, 94, 65, 79, 75, 74, 86,
    76, 85, 78, 72, 76, 79, 65, 92, 76, 80);
TwoSampleTTest tTest2 = new TwoSampleTTest(results, results2,
    SamplePairing.Unpaired, assumeEqualVariances: false);

Console.WriteLine("Test statistic: {0:F4}", tTest2.Statistic);
Console.WriteLine("P-value:        {0:F4}", tTest2.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    tTest2.Reject() ? "yes" : "no");

Visual Basic

Dim results2 = Vector.Create(Of Double)(61, 80, 98, 90, 94, 65, 79,
    75, 74, 86, 76, 85, 78, 72, 76, 79, 65, 92, 76, 80)
Dim tTest2 = New TwoSampleTTest(results, results2,
    SamplePairing.Unpaired, False)

Console.WriteLine("Test statistic: {0:F4}", tTest2.Statistic)
Console.WriteLine("P-value:        {0:F4}", tTest2.PValue)
Console.WriteLine("Reject null hypothesis? {0}",
    IIf(tTest2.Reject(), "yes", "no"))

Visual Basic

No code example is currently available or this language may not be supported.

let results2 = Vector.Create(
                61., 80., 98., 90., 94., 65., 79., 
                75., 74., 86., 76., 85., 78., 72., 
                76., 79., 65., 92., 76., 80.)
let tTest2 = TwoSampleTTest(results, results2,
                SamplePairing.Unpaired, assumeEqualVariances= false)

printfn "Test statistic: %.4f" tTest2.Statistic
printfn "P-value:        %.4f" tTest2.PValue
printfn "Reject null hypothesis? %s"
    (if tTest2.Reject() then "yes" else "no")

The value of the t-statistic is -1.4337 giving a p-value of 0.1598. As a result, the hypothesis that on average, the students in the first group score no different than the students in the second group is not rejected at the 0.05 level.

References

"Statistical Methods for the Social Sciences" by Alan Agresti and Barbara Finlay
"Introduction to the Practice of Statistics" by David S. Moore, George P. McCabe, and Bruce A. Craig