Transforming Numerical Variables

In many situations, it is useful to apply some kind of transformation to a numerical variable. To avoid cluttering the members of the Vector<T> class with these methods, they are made available as extension methods of the static VectorExtensions class.

Transformations can be subdivided in the following categories:

  • Simple transformations.
  • Indicators of change.
  • Extrapolated indicators of change.
  • Moving averages.
  • Other moving summary statistics
  • Partial sums and differences.
  • Miscellaneous transformations.

Arithmetic operations have been discussed in the previous section. They are available as overloaded operators or static (Shared in Visual Basic) operator methods on the Vector<T> class. The remaining transformations will now be described in greater detail.

Simple transformations

This category includes transformations that involve arithmetic operations and translations.

The GetLag method is overloaded. Without parameters, it returns a variable whose observations are moved ahead by one interval. Each new observation is the observation before the current observation. The first observation is set to NaN.

The second overload takes one argument: the lag, or number of observations to shift the series by. A positive value indicates that the observations are shifted forward. If the lag is equal to 1, then each new observation is the observation before the current observation. If the lag is equal to -1, then each new observation is the observation after the current observation. Any observations that do not exist in the original variable are set to NaN.

The third overload takes two arguments. The first argument is once again the lag. The second argumentspecifies the value of the observations that do not exist in the original variable.

The CumulativeSum method returns a variable whose observations are the cumulative sum of all observations up to the current observation. The CumulativeProduct method returns a variable whose observations are the cumulative product of all observations up to the current observation.

The following example creates a variable that contains the observations of the previous period. It then creates a variable that contains the cumulative sum of the variable.

C#
NumericalVariable previous = current.Transforms.GetLag(1);
NumericalVariable cumsum = current.Transforms.GetCumulativeSum();

Indicators of change.

This set of transformations compares each current observation to a past observation. The distance between the current observation and its reference observation is called the lag. It is passed to each of the methods as their only parameter. Its value must be greater than zero.

The Change method returns a variable where each observation is the difference between the current observation and the reference observation.

The PercentChange method is similar. Each observation is the percentage change of the current observation relative to the reference observation. If the reference observation is zero, or if the current observation and the reference observation have a different sign, the new observation is NaN.

The GrowthRate method returns a variable containing the exponential growth rate. Each observation is the percentage change of the current observation relative to the reference observation, assuming the growth compounds continuously over time. If the reference observation is zero, or if the current observation and the reference observation have a different sign, the new observation is NaN.

The first lag-1 observations of the new variable are set to NaN.

The example below calculates the different indicators of change for a 10 period lag:

C#
NumericalVariable change = current.Transforms.GetChange(10);
NumericalVariable pctChange = current.Transforms.GetPercentChange(10);
NumericalVariable growthRate = current.Transforms.GetGrowthRate(10);

Extrapolated indicators of change.

This set of transformations is similar to the previous one. However, the observed change is extrapolated to a larger interval. Once again, the lag is passed as the first parameter. A second parameter, numberOfPeriods, indicates the relative size of the extrapolation interval.

For example, if the current variable represents the price of a certain commodity at the end of each month, then a value of 12 for numberOfPeriods produces a variable that represents the annualized change in price over each month.

The ExtrapolatedChange method returns a variable where each observation is the extrapolated difference between the current observation and the reference observation.

The ExtrapolatedPercentChange method is similar. Each observation is the extrapolated percentage change of the current observation relative to the reference observation. If the reference observation is zero, or if the current observation and the reference observation have a different sign, the new observation is NaN.

The ExtrapolatedGrowthRate method returns a variable containing the extrapolated exponential growth rate. Each observation is the extrapolated percentage change of the current observation relative to the reference observation, assuming the growth compounds continuously over time. If the reference observation is zero, or if the current observation and the reference observation have a different sign, the new observation is NaN.

Once again, the first lag-1 observations of the new variable are set to NaN.

The example below calculates the different indicators of change for a 10 period lag and extrapolates it to 360 period values:

C#
NumericalVariable change360 =    current.Transforms.GetExtrapolatedChange(10, 360);
NumericalVariable pctChange360 =    current.Transforms.GetExtrapolatedPercentChange(10, 360);
NumericalVariable growthRate360 =    current.Transforms.GetExtrapolatedExponentialGrowthRate(10, 360);

Moving averages.

Moving averages are commonly used to smooth data, and to find trends in time series.

The MovingAverage method returns the simple moving average. It takes one argument: the number of observations to average. Each new observation is the average of the n observations up to and including the current observation.

The ExponentialMovingAverage method returns the exponential moving average. Each new observation is a weighted combination of the current observation and the previous average.

The exponential moving average can be specified in two ways. You can specify the period as an integer. Alternatively, you can specify the smoothing constant. This is a real number between 0 and 1 that specifies the contribution of the current observation to the current moving average.

The code below calculates three moving averages: a simple 20 day moving average, a 20 day exponential moving average, and a 3 day exponential moving average specified using the smoothingconstant:

C#
NumericalVariable MA20 = current.Transforms.GetMovingAverage(20);
NumericalVariable EMA20 = current.Transforms.GetExponentialMovingAverage(20);
NumericalVariable EMA3 = current.Transforms.GetExponentialMovingAverage(2.0 / (1 + 3));

The WeightedMovingAverage method returns a weighted moving average. Each new observation is the weighted sum of the observations.

The weights for the weighted moving average can be supplied as a Double array, or as a Vector<T>. The weights are used in reverse order. The weight with index zero is the weight for the current observation. The weight with index one is the weight for the previous observation.

An optional integer parameter specifies the index in the weight vector that corresponds to the current observation. This allows you to create centrally weighted averages. The default is zero.

The following example creates a weighted moving average of five observations centered around the current observation:

C#
double[] weights = {1.0, 2.0, 3.0, 2.0, 1.0};
NumericalVariable WMA3 = current.Transforms.GetWeightedMovingAverage(weights, 2);

Other moving summary statistics.

The methods in this group calculate some statistic of a moving window of observations.

The MovingMaximum method returns a variable whose observations are the largest of the n observations up to and including the current observation. The MovingMinimum method returns a variable whose observations are the smallest of the n observations up to and including the current observation.

The MovingMaximum method calculates a moving standard deviation. Each new observation is the standard deviation of the n observations up to and including the current observation. It takes two arguments. The first is an integer that specifies the number of observations. The second argument is a Vector<T> that contains the simple moving average of the variable over the same number of observations. The MovingSum method calculates a moving sum of the n observations up to and including the current observation.

The MovingAverageAbsoluteDeviation method calculates the average absolute deviation of the n observations up to and including the current observation from the corresponding current observation of another variable. The first argument is the number of observations. The second argument is a Vector<T> that contains the means from which the deviation is to be calculated.

Period-to-date values and differences.

There are two transformations in this group. The first calculates cumulative sums of the original observations within a series of intervals. The second is the inverse transformation of the first. It calculates the difference between each observation and the previous one, except for the first observation in each interval.

The PeriodToDateValues method calculates period-to-date sums. Each observation is the cumulative sum of the observations in the current interval.

A common use for this method is to create period-to-date sum of a time series variable relative to a longer time frame. For example, if the variable contains monthly earnings, you can use these methods to calculate the earnings to date per quarter.

This method has two overloads. The first takes an integer array whose elements specify the boundaries of the intervals. The remaining two parameters are BoundaryIntervalBehavior values that indicate how the first and last interval should be handled. If startBehavior has a value of Exclude, then new observations with index smaller than the first index in indexes should be set to NaN.

The second overload is useful for variables that are part of a . The first argument is a Vector<T> of DateTime that specifies the time corresponding to each observation. It must have the same length as the numerical variable. The second argument is another DateTime vector that indicates the start time of each interval. The remaining two parameters are BoundaryIntervalBehavior values, as before.

The PeriodToDateDifferences method performs the reverse operation. Each observation is the difference between the current and the previous observation in the current interval, except when it is the first observation in the current interval. In that case, the new observation is the same as the original observation.

Miscellaneous transformations.

The ReferenceIndex method scales the observations to make them comparable to a standard index value. The method has two overloads. The first overload takes two parameter. The first is the index of the observation that serves as a reference. The second parameter is the base value of the index. The observations are scaled so that the index value of the reference observation equals to base value of the index.

The second overload takes three arguments. This method calculates the reference index based on the sum of a range of observations. It takes three arguments. The first is the index of the first observation in the reference interval. The second argument is the index of the last observation in the reference interval. The third argument is the base value of the index. The observations are scaled so that the sum of the index values in the reference interval equals the base value of the index.

The PositiveToNegativeRatio method calculates the ratio of the positive values to the negative values over an interval. The first argument is the lenght of the interval. The second argument is a Vector<T> that serves as the reference variable. The method calculates the ratio of the sum of observations within the specified period where the corresponding observation of the reference variable is positive, and the sum of observations where the corresponding reference observation is negative. Observations where the corresponding reference observation is zero are not included.

The PositiveToNegativeIndex method performs a similar calculation. However, the result is not returned as a ratio, but as an index value between 0 and 100. It has the same parameters as the GetPositiveToNegativeRatio method.

Finally, the BoxCoxTransform returns the Box-Cox transform of the variable for the specified parameter lambda, which must be between 0 and 1. This transformation is often used to reduce the effects of non-normality.