Vectors and matrices as data frames

As mentioned in the introduction, data frames on the one hand and vectors and matrices on the other hand or complimentary. With data frames, the focus is on identifying rows and columns through their keys, while with vectors and matrices, the focus is on the values. Still, since matrices may also have row and column indexes, it makes sense that they may at times be used as data frames. In particular, vectors and matrices can serve as the data source for a statistical or machine learning model.

The essence of data frame functionality is captured in the IDataFrame interface. This interface provides access to the (untyped) row and column indexes, and to individual columns. Both the Vector<T> and Matrix<T> type implement this interface.

The RowIndex and ColumnIndex properties return the row and column indexes of a matrix, if one has been defined. These properties can be written to as well. The only requirement is that the new index has the correct number of elements.

A vector acts as a data frame with a single column. The Index property corresponds to the row index. The Name property return the column key converted to a string. The column index can still be accessed through the IDataFrame.ColumnIndex property.

Conversions exist between data frames and matrices. The DataFrame<R, C> class has a ToMatrix<T>(Boolean, Boolean) method that converts the columns of a data frame to a matrix with element type specified by the generic type argument. This method takes two arguments, both Boolean values. The first specifies whether columns whose element type is incompatible with the element type of the matrix are to be skipped. The default is true. The second value specifies whether the element types should match exactly. The default is false.

Likewise, the Matrix<T> type has a ToDataFrame method that has two overloads. The first overload converts a matrix which has row and column indexes. It takes no actual arguments but does take two generic type arguments that specify the element types of the row and column index. These types must match the element types of the indexes of the matrix. The second overload takes two arguments: the row and column index of the new data frame.