Binning and Discretization

It is often necessary to group numerical data into categories. The range of the data is divided into a number of intervals, where each interval becomes a category in a numerical scale. This type of numerical scale is implemented by the IntervalIndex<T> class. This class inherits from Index<T>, but provides some additional functionality.

Interval Scales

Constructing Interval Scales

The IntervalIndex<T> class takes one generic type argument: the type of the bounds of the intervals. This must be a type that implements the IComparable<T> interface. It has four constructors. They come in two pairs, each pair offering one way of defining the intervals that make up the scale. Each constructor also corresponds to an overload of the static CreateBins method of the Index class.

The first constructor takes one argument: a Double array that contains the boundaries of the intervals. The values in this array must be in ascending order, or an ArgumentException will be thrown.

C#
double[] bounds = new double[] { 50, 62, 74, 88, 100 };
var scale1 = new IntervalIndex<double>(bounds);
var scale1a = Index.CreateBins(bounds);

The second constructor takes one additional argument: a SpecialBins value that specifies which special intervals to include in the scale, if any. The possible values are as follows:

Values of the SpecialBins enumeration.

Name

Description

None

No special intervals are included.

BelowMinimum

There is a special interval for values below the minimum value.

AboveMaximum

There is a special interval for values above the maximum value.

OutOfRange

There is a special interval for values that are outside the scale's range.

Missing

There is a special interval for missing values.

If BelowMinimum is included, an interval with as lower bound the smallest possible value for the element type is inserted before all other intervals. If AboveMaximum is included, an interval with as upper bound the largest possible value is added at the end. The following creates an interval index with the same boundaries as above, but with an extra interval to hold values less than 50:

C#
var scale2 = new IntervalIndex<double>(bounds, SpecialBins.BelowMinimum);
var scale2a = Index.CreateBins(bounds, SpecialBins.BelowMinimum);

The third constructor takes three arguments. The first two are the lower bound of the first interval, and the upper bound of the last interval. The third argument is the total number of intervals. This creates a scale with the specified number of intervals that are all equal in width. The fourth constructor has one additional argument: a SpecialBins value that indicates which special values should be tabulated in addition to those within the specified interval. The code below creates a scale with five intervals for values between 50 and 100:

C#
var scale3 = new IntervalIndex<double>(50.0, 100.0, 5);
var scale3a = Index.CreateBins(50.0, 100.0, 5);

Mapping Values to Intervals

The Lookup method has a couple of additional overloads in addition to the ones defined for standard Index<T> objects. These methods map a value to the index of the interval that contains it. There are two overloads: one that takes a single value and returns the integer index (or -1 if no interval contains the value), and one that takes a list of values and returns an array of indexes:

C#
Console.WriteLine(scale3.Lookup(63.5)); // 1
double[] values = { 71.3, 39.5, 66.7, 90.4, 62.1 };
Console.WriteLine(scale3.Lookup(values)); // { 2, -1, 1, 4, 1 }

Binning vectors

Once an interval index has been defined, it can be used to map a vector of values to a vector of categories. The Vector.Bin performs this operation. This is defined as an extension method, so it can be called on the vector directly. It has multiple overloads that can work on both typed and untyped vectors. In its simplest form, it takes two arguments: a vector and an interval index. This creates a categorical vector of intervals.

C#
var v = Vector.CreateRandom(100);
var bins = Index.CreateBins(0.0, 1.0, 10);
var vBinned1 = v.Bin(bins);
var bounds = Vector.CreateFromFunction(9, i => (i+1) / 10.0);
var vBinned2 = v.Bin(bounds, SpecialBins.BelowMinimum | SpecialBins.AboveMaximum);
var vBinned3 = v.Bin(10);