Model Persistence
Model persistence allows you to save trained statistical models for later use. This enables you to deploy models in applications and services, store models for reproducibility, and share models between systems.
Purpose of persistence
Model persistence serves several important purposes:
Saving trained models for later use: Train a model once and reuse it without refitting.
Embedding models in applications and services: Deploy models in production systems, web services, or desktop applications.
Long-term storage and reproducibility: Archive models for regulatory compliance or scientific reproducibility.
What persistence saves
The persistence mechanism saves only the minimal predictive state required for the model to function in the Deployed state. This includes:
Model coefficients and parameters.
Transformation matrices and loadings (for PCA, Factor Analysis, etc.).
Cluster centers (for K-Means and related methods).
Category labels (for classification models such as logistic regression).
Row and column dimensions for matrices.
Schema information required to process new input data.
What persistence does not save
To minimize storage size and protect data privacy, the following are not included in the persisted model:
Raw training data.
Residuals and fitted values.
Diagnostic statistics and test results.
Optimization history and convergence details.
Variables eliminated during model fitting.
Any data not required for prediction or transformation.
If you need to preserve diagnostic information, consider storing it separately or maintaining the original fitted model.
JSON format
Models are serialized using System.Text.Json with a source-generated serialization context for optimal performance and AOT (Ahead-of-Time) compilation compatibility.
The JSON format includes:
Type identifier: Identifies the model type (e.g., "LinearRegression", "PCA", "KMeans").
Format version: Enables backward compatibility as the format evolves.
Model adapter: Contains variable mapping information.
Model-specific payload: Contains the numerical state required for prediction.
Saving and loading models
To save a model, call the ToJson() method on a fitted model. This returns a JSON string that can be stored in a file, database, or any other storage mechanism:
// Save model to JSON string
string json = model.ToJson();
// Save to file
File.WriteAllText("model.json", json);To load a model, use the static FromJson() method on the appropriate model class. The method parses the JSON and reconstructs the model:
// Load from file
string json = File.ReadAllText("model.json");
var model = SimpleRegressionModel.FromJson(json);Models loaded from JSON are always in the Deployed state. They can be used immediately for prediction but do not have access to training data or diagnostics.
Guarantees
The persistence mechanism provides the following guarantees:
Prediction consistency: Predictions made by a Fitted model, a Deployed model, and a model loaded from JSON will match within floating-point tolerance.
Format stability: The persistence format is versioned and designed to remain stable across library versions. Older JSON files will continue to load correctly.
Complete example
The following example demonstrates the complete workflow of training a model, saving it to JSON, and loading it for use:
// Train a model
var y = Vector.Create(1.0, 2.0, 3.0, 4.0, 5.0);
var x = Vector.Create(1.0, 2.0, 3.0, 4.0, 5.0);
var model = new SimpleRegressionModel(y, x);
model.Fit();
// Save to JSON
string json = model.ToJson();
File.WriteAllText("regression_model.json", json);
// Later, load and use
string loadedJson = File.ReadAllText("regression_model.json");
var loadedModel = SimpleRegressionModel.FromJson(loadedJson);
// Make predictions (works the same as fitted model)
double prediction = loadedModel.Predict(6.0);