Sorting And Filtering

In the previous section, the DataFrame<R, C> class was introduced as the representation of a data set. This representation favors the column-oriented view: the members of the collection are the variables, usually the columns in a table. In this section, we will look at the row-oriented view. We'll also discuss sorting and filtering.

Sorting

The data in a DataFrame<R, C> can be sorted by the values of its row index, one or more levels in a hierarchical row index, or one or more columns. For each column, it can be specified whether the result should be in ascending or descending order. All methods that perform sorting return a new data frame.

The simplest way to sort a data frame is on its index. This operation is implemented by the SortByIndex method. An optional argument of type SortOrder indicates whether the rows should be sorted in ascending or descending order. The default is ascending. In the following example, a data frame with a row index of dates is sorted so the dates are in descending order:

C#
var dates = Index.CreateDateRange(new DateTime(2015, 11, 11), 15, Recurrence.Daily);
var df1 = DataFrame.FromColumns(new Dictionary<string, object>() {
        { "values1", Vector.CreateRandom(dates.Length) },
        { "values2", Vector.CreateRandom(dates.Length) },
        { "values3", Vector.CreateRandom(dates.Length) },
        }, dates);
var df2 = df1.SortByIndex(SortOrder.Descending);

The SortBy sorts a data frame by the values in a column. This method has multiple overloads. The first overload takes as its only argument the name of the column by which the data is to be sorted. An optional second argument is a SortOrder value which specifies whether to sort the data frame in ascending or in descending order. The default is ascending.

C#
var df3 = df1.SortBy("values1", SortOrder.Ascending);

Advanced sorting

Sorting on multiple columns will be supported in a future version.

Filtering

It is often necessary to perform calculations on a subset of data, based on certain criteria. Filtering is done by indexing the rows of the data frame directly. Filtering always creates a new data frame.

The rows can be specified as a list of integer row indexes, a list of row keys, or a boolean vector where true values indicate the row should be retained.