Statistical Variables
A variable is a collection of observations of a characteristic of an object that can take on two or more values. This chapter provides an overview of how variables are implemented in Numerics.NET.
Variables occur in two situations, either on their own or as part of a collection. On their own, you can use them to calculate descriptive statistics, like the mean and the standard deviation. Or you can use it to perform statistical tests, such as the one-sample z test or the Kolmogorov-Smirnov goodness-of-fit test. Most often, however, variables will come in groups and represent different properties or measurements in a data set.
In Numerics.NET, variables are implemented as Vector<T> objects.
Variables can be either continuous, or categorical.
Continuous Variables
Any variable that can take on a value from a continuous range is called a continuous variable. Any numerical or date type can be used as the element type. Nothing needs to be done to mark a vector as a continuous variable.
Categorical Variables
Variables whose observations can take on only one of a finite set of values are called categorical variables or discrete variables. They are implemented by the CategoricalVector<T> class.
Fundamental to the implementation of categorical variables is the category index. The category index represents the possible values that a variable can have. Every categorical variable has an associated category index. The index is used to map an object to its category, or to the index of its category in a list of categories.
The category index is an Index<T> object.
Data Frames
Most statistical data sets are made up of several variables. This functionality is encapsulated in a DataFrame object. Data frames can be created by adding individual variables, and by importing them from files or data bases.