Data Files and Data Streams
The API for the data access classes is modeled after the File and Stream classes in the System.IO namespace of the .NET Base Class Libraries. The File class contains static methods for reading and writing to files in a single call. It also has static methods for opening files for reading or writing. These methods return streams. The Stream classes contain methods for reading and writing individual values, and allow more fine-grained control.
Likewise, for each file format we have a File-like class (DelimitedTextFile, RdataFile, MatlabFile, and so on) and a corresponding Stream-like class DelimitedTextStream, RdataStream, MatlabStream, and so on). The table below lists the classes for each format:
File format | "File" class | "Stream" class |
---|---|---|
Delimited text files | ||
Fixed width text files | ||
Matrix Market files | ||
JSON files | ||
Matlab® files | ||
R files (.rdata) | ||
R files (.rds) | ||
stata® files |
Data file classes
Each file format has a corresponding class that contains static methods that perform an operation in a single call. We will use R files (with extension .rdata or .rda) as an example.
The methods defined by these classes fall into 3 general categories: reading objects, writing objects, and opening files or streams. For example, the ReadDataFrame method reads the item stored in a .rdata file into a data frame.
Data streams
The stream classes all inherit from a common base class, DataStream. There may also be an Options class that lets you specify details for a specific file format. Streams are created using one of the methods of the corresponding File class.
Some file formats support one object per file, while others may contain multiple named objects. Examples of the latter are: R files and Matlab files. For these file formats, the stream class inherits from a specialized class: CompositeDataStream<TObject>. This class takes one generic type argument: the type of the objects that are stored in the file.