R files
R is an open source environment for statistical computing. R has its own data format, a variant of XDR (“eXternal Data Representation”) and two types of files. The save function in R saves one or more objects in the current environment to a file, which usually has the extension .rda or .rdata
Rdata files contain a pair list of names of objects and their values. Values can have attributes which are used to add functionality. For example, the column names and row keys in a data frame are stored in attributes. Rdata streams are composite data streams.
A second file type, usually with extension .rds, is somewhat less common. Rds files may only contain one object that is unnamed.
Both kinds of R data files can be stored in binary or ASCII, and may or may not use data compression. All variations are supported for both reading and writing, and are detected automatically when reading.
Reading R files
The RdataFile class contains static methods for reading one or multiple objects from a file in .rdata format.
Reading single objects
The ReadDataFrame method reads a data frame from a file. The method takes a single argument. This may be a string containing the path to the file, or a Stream that has been opened for reading. If a filename is given, it may be the path to a local file, or the uri of a resource on the Internet. This method returns the first data frame found in the file.
Optionally, a second string argument may be passed that specifies the name of the R variable. This method reads objects from the file until an object with the specified name is found, and returns it as a data frame, if possible.
Data frames read in this way always have a column index of strings (the column names) and a row index of row numbers (64 bit signed integers). The row index stored in the R file is essentially lost. To keep the stored index information, the types of the row and the column keys can be passed as generic type arguments to the ReadDataFrame method. This will convert the stored indexes to the requested types as needed.
The example below reads a data frame from a data file. Its row index is of type DateTime. It then reads a second data frame named frame1 from a fictitious URL:
var df1 = RdataFile.ReadDataFrame<DateTime, string>(@"c:\data.rda");
var df2 = RdataFile.ReadDataFrame("http://www.example.com/sample.rda", "frame");
Similar methods exist for reading single vectors and matrices. The ReadVector method reads a vector from the file. It takes one type argument that is required: the element type of the vector to read. The first actual argument is once again the path to the file or Internet resource, or a stream. The name of the object may be passed as the second argument. The optional last argument specifies whether the element type of the stored vector should match the specified element type exactly. The default is false, which means that the read operation will succeed as long as the stored element type can be cast to the requested element type.
The ReadMatrix method reads a matrix from the file. It has the same arguments and overloads as the ReadVector. The element type must be supplied as a generic type argument. The actual arguments are the path to the file or resource or the stream to read from, optionally the name of the variable, and optionally whether the element type should match exactly.
var vector1 = RdataFile.ReadVector<double>(@"c:\vector.rda");
var matrix1 = RdataFile.ReadMatrix<double>("http://www.example.com/matrix.rda", "mat");
Reading multiple objects
The ReadDataFrames method reads multiple data frames from a file. The method takes two arguments. The first is a string containing the path to the file, or a Stream that has been opened for reading. If a filename is given, it may be the path to a local file, or the uri of a resource on the Internet. The second argument is a sequence of strings containing the names of the variables containing the data frames. This method returns a dictionary that maps the names of the variables to data frames.
The ReadAllDataFrames method reads all data frames from a file into a dictionary that maps the names of the variables to data frames.
Data frames read in this way always have a column index of strings (the column names) and a row index of row numbers (64 bit signed integers). The row index stored in the R file is essentially lost. To keep the stored index information, the types of the row and the column keys can be passed as generic type arguments to the ReadDataFrames or ReadAllDataFrames method. This will convert the stored indexes to the requested types as needed.
The example below reads two data frames named frame1 and frame2 from a data file, and then assigns them to two local variables:
string[] dataFrameVars = new[] { "frame1", "frame2" };
var dataFrames = RdataFile.ReadDataFrames(@"c:\data.rda", dataFrameVars);
var frame1 = dataFrames["frame1"];
var frame2 = dataFrames["frame2"];
Similar methods exist for reading multiple vectors and matrices. The ReadVectors method reads multiple vectors from the file. It takes one type argument that is required: the element type of the vector to read. The first actual argument is once again the path to the file or Internet resource, or a stream. The second argument is a sequence of names of the variables in the file that contain the vectors. The optional last argument specifies whether the element type of the stored vectors should match the specified element type exactly. The default is false, which means that the read operation will succeed as long as the stored element type can be cast to the requested element type. The ReadAllVectors returns a dictionary with all the vectors in the file.
The ReadMatrices method reads matrices from the file. It has the same arguments and overloads as the ReadVectors. The element type must be supplied as a generic type argument. The actual arguments are the path to the file or resource or the stream to read from, a sequence of variable names containing the matrices, and optionally whether the element type should match exactly. The ReadAllMatrices returns a dictionary with all the matrices in the file.
Writing R files
The Write method is used to write one or more data frames, vectors, or matrices to a file. The method has many overloads.
The first argument always specifies the destination in one of two ways. It can be a string that contains the path to the file. If the file exists, it is overwritten. If it doesn't exist, then it is created. Alternatively, the destination can be specified as a Stream.
The second argument always specifies the object(s) to be written. This can be a single data frame, matrix, or vector. It can also be a sequence of data frames, matrices, or vectors, or a dictionary that maps names to objects.
The third argument is optional: it specifies the name(s) to use for the objects in the data file. A default value is supplied when this argument is omitted.
The last two arguments are also optional. They are boolean values that specify whether the output should be compressed, and whether the output should be in ASCII format instead of binary. The default is to use compression in binary format. This is also the default in R.
In the example code below, we write a data frame to a file, and then a matrix to a stream.
RdataFile.Write(@"c:\data.rda", df1);
using (var stream = File.OpenWrite(@"c:\output.rda"))
{
RdataFile.Write(stream, matrix1);
}
Using R Data Streams
R data streams are implemented by the RdataStream class. This class has no constructors. Instead, use one of the methods of the RdataFile class. Streams can be opened for reading or for writing, but not both.
Opening streams for reading
The Open(String) method opens a file or stream for reading. The only argument is a string or a stream. If it is a string, it is the path to the file that should be opened, or the URI of a network or Internet resource. If it is a stream, then it specifies the data stream that the objects should be read from.
The methods for reading objects from streams are similar to those of the RdataFile class, but without the argument that specifies the source.
Reading single objects
The ReadDataFrame method reads a data frame from a file. This method returns the next data frame found in the file.
Optionally, a string argument may be passed that specifies the name of the R variable. This method reads objects from the file until an object with the specified name is found, and returns it as a data frame, if possible. If an object with the name was read previously, it is returned as a data frame without reading more from the stream.
Data frames read in this way always have a column index of strings (the column names) and a row index of row numbers (64 bit signed integers). The row index stored in the R file is essentially lost. To keep the stored index information, the types of the row and the column keys can be passed as generic type arguments to the ReadDataFrame method. This will convert the stored indexes to the requested types as needed.
The example below reads a data frame from a data file. Its row index is of type DateTime. It then reads a second data frame named frame1 from a fictitious URL:
using (var s1 = RdataFile.Open("http://www.example.com/sample.rda"))
{
var df1 = s1.ReadDataFrame<DateTime, string>();
var df2 = s1.ReadDataFrame("frame");
}
Similar methods exist for reading single vectors and matrices. The ReadVector method reads a vector from the file. It takes one type argument that is required: the element type of the vector to read. The name of the object may be passed as the first argument. The optional last argument specifies whether the element type of the stored vector should match the specified element type exactly. The default is false, which means that the read operation will succeed as long as the stored element type can be cast to the requested element type.
The ReadMatrix method reads a matrix from the file. It has the same arguments and overloads as the ReadVector. The element type must be supplied as a generic type argument. The actual arguments are both optional: the name of the variable, and whether the element type should match exactly.
using (var s2 = RdataFile.Open(@"c:\vector.rda"))
{
var vector1 = s2.ReadVector<double>();
var matrix1 = s2.ReadMatrix<double>("mat");
}
Reading multiple objects
The ReadDataFrames method reads multiple data frames from a file. The method takes one argument: a sequence of strings containing the names of the variables containing the data frames. This method returns a dictionary that maps the names of the variables to data frames.
The ReadAllDataFrames method reads all data frames from a file into a dictionary that maps the names of the variables to data frames.
Data frames read in this way always have a column index of strings (the column names) and a row index of row numbers (64 bit signed integers). The row index stored in the R file is essentially lost. To keep the stored index information, the types of the row and the column keys can be passed as generic type arguments to the ReadDataFrames or ReadAllDataFrames method. This will convert the stored indexes to the requested types as needed.
The example below reads two data frames named frame1 and frame2 from a data file, and then assigns them to two local variables:
string[] dataFrameVars = new[] { "frame1", "frame2" };
var dataFrames = s2.ReadDataFrames(dataFrameVars);
var frame1 = dataFrames["frame1"];
var frame2 = dataFrames["frame2"];
Similar methods exist for reading multiple vectors and matrices. The ReadVectors<T> method reads multiple vectors from the file. It takes one type argument that is required: the element type of the vector to read. The first argument is a sequence of names of the variables in the file that contain the vectors. The optional second argument specifies whether the element type of the stored vectors should match the specified element type exactly. The default is false, which means that the read operation will succeed as long as the stored element type can be cast to the requested element type. The ReadAllVectors<T> returns a dictionary with all the vectors in the file.
The ReadMatrices<T> method reads matrices from the file. It has the same arguments and overloads as the ReadVectors<T>. The element type must be supplied as a generic type argument. The actual arguments are a sequence of variable names containing the matrices, and optionally whether the element type should match exactly. The ReadAllMatrices<T> returns a dictionary with all the matrices in the file.
Opening streams for writing
There are two methods that can be used to create an R data stream for writing. The Create(String, Boolean, Boolean) method opens a file for writing. The only argument is a string that is the path to the file that should be opened. If the file exists, its contents are destroyed. If the file does not exist, it is created. The optional second argument is a boolean value that specifies whether the data should be compressed. The default is true. The optional third argument is also a boolean value that specifies whether the data should be written out in human-readable ASCII format. The default is false.
The Append(Stream, Boolean, Boolean) method opens a stream using an existing writable stream. The first argument is the stream to write the objects to. The second and third arguments are optional. They are boolean values that specify whether the data should be compressed, and whether the data should be written in ASCII format.
Writing objects
The Write method is used to write one or more data frames, vectors, or matrices to a file. The method has many overloads.
The first argument always specifies the object(s) to be written. This can be a single data frame, matrix, or vector. It can also be a sequence of data frames, matrices, or vectors, or a dictionary that maps names to objects.
The second argument is optional: it specifies the name(s) to use for the objects in the data file. A default value is supplied when this argument is omitted.
The following code creates a new .rda file, and writes 2 data frames, a matrix, and a vector to it:
using (var stream = RdataFile.Create(@"c:\data.rda"))
{
stream.Write(new[] { frame1, frame2 }, new string[] { "df1", "df2" });
stream.Write(matrix1, "matrix1");
stream.Write(vector1, "vector1");
}