Continuous Distributions

All classes that implement continuous probability distributions inherit from the ContinuousDistribution class. This class defines methods and properties common to all continuous probability distributions. All distributions are immutable: once they are created, the parameters cannot be changed.

Parameters of a distribution

Most probability distributions are defined by one or more parameters. They usually come in three types: location, scale, and shape parameters.

  • The location parameter determines the mean value of the distribution. In some cases, the mean of the distribution is the location parameter, but this need not be the case. When available, the location parameter can take on any finite value.

  • The scale parameter determines the horizontal scale of the distribution. A large scale parameter means that the graph is the distribution is stretched out. For some distributions, the standard deviation serves as the scale parameter, but this need not be the case. When available, the scale parameter can take on any finite, strictly positive value.

  • One or more shape parameters may further determine the exact form of the distribution. Shape parameters usually affect the skewness and kurtosis of a distribution.

A type of distribution may have one or more of these parameters, depending on the nature and common usage of the distribution. For example, the exponential distribution only has a scale parameter. The normal distribution has a location and a scale parameter. The beta distribution has two shape parameters.

For any of the distributions available in Numerics.NET, any number of definitions may exist in the literature, each with their own particular meanings for the distribution parameters. To minimize confusion, the parameters are defined according to the above definitions.

For example, the exponential distribution can be specified using either the mean time between events, or the hazard rate. One is the inverse of the other, but the mean time between events is the parameter that fits the definition of a scale parameter. Therefore, it is the definition in terms of the mean time between events that is used, and the parameter of the ExponentialDistribution class, which implements the exponential distribution, is the mean time between events.

Each distribution object has one or more properties corresponding to the distribution parameters. Unless the meaning is unambiguous, these properties are named according to the type of parameter. Returning to the ExponentialDistribution class again, the scale parameter can be called many things: mean time between events, mean time to failure, mean time between failures, and even simply the mean. To avoid referring to a parameter by a name that is only meaningful in a specific context, the neutral name ScaleParameter was chosen to refer to this parameter.

Properties of distributions

Each distribution object has one or more properties that return the parameters of the distribution.

In addition, each distribution defines properties that describe the main traits of the distribution. The Mean property returns the mean of the distribution.

The StandardDeviation property returns the standard deviation of the distribution. The Variance property returns the variance. The InterQuartileRange property returns the difference between the third and the first quartile. The Skewness property returns the skewness of the distribution, and the Kurtosis property returns the kurtosis supplement of the distribution.

In addition, the IsSymmetrical property indicates whether the distribution is symmetrical about the mean. When a distribution is symmetrical, the skewness is zero. However, for most distributions, symmetry can be calculated much more efficiently from the distribution parameters.

Distribution Functions

Associated with each distribution are a number of functions that are commonly used in statistical calculations. The distribution function, often called the cumulative density function (CDF), gives the probability that a sample or sample from the distribution has a value less than its argument. It is implemented by the DistributionFunction method.

The inverse of the CDF is given by the InverseDistributionFunction method. This method is only defined for values of the probability between 0 and 1. Passing an argument outside this range results in an exception.

The survivor distribution function (SDF) is the complement of the distribution function. It gives the probability that a random sample from the distribution has a value greater than its argument. It is implemented by the SurvivorDistributionFunction method.

The probability density function (PDF) is the derivative of the cumulative density function. It is implemented by the ProbabilityDensityFunction method.

Generating Random Samples

One of the principal applications of probability distributions is the generation of random numbers that follow a certain distribution. The continuous distribution classes provide a series of methods to make this happen.

The Sample returns a single random sample from the distribution. It has overloads as both static (Shared in Visual Basic) and instance methods. The instance method has only one parameter: the random number generator that will be used to generate the uniform random number(s) used in the calculation of the sample. It is of type System.Random. Any of the random number generators from the Numerics.NET.Random namespace can be used for this purpose.

Distribution classes may also define one or more static (Shared in Visual Basic) overloads, one each for each constructor. The first argument is always the random number generator, as above. Additional parameters correspond to the parameters of each constructor. This makes it possible to generate random samples for any distribution without first constructing a distribution object.

The Sample method generates a large number of random samples at once. The first argument is once again the uniform random number generator. The samples are returned as a Vector<T>. The array or vector must be supplied as the second parameter. Two additional parameters can be provided, which supply the start index and the length of a segment in the vector where the samples are to be copied.