Factor Analysis
Factor analysis is a method of grouping a set of variables into related subsets. Different methods exist for extracting the factors. After extraction, the factors can be rotated in order to further bring out the relationship between variables.
Factor analysis is implemented by the FactorAnalysis class and related types in the Extreme.Statistics.Multivariate namespace.
Factor methods
A factor analysis can operate on either the correlation matrix or the covariance matrix of a set of variables. The first step in a factor analysis is the computation of this matrix. The options are enumerated in the FactorMethod enumeration type. It can have the following values:
Value | Description |
---|---|
Correlation | Use the correlation matrix of the variables to perform the calculations. This is the default. |
Covariance | Use the covariance matrix of the variables to perform the calculations. |
Extraction methods
The next step in the factor analysis is to extract the factors from the correlation or covariance matrix. There are several ways to perform this step. A major difference between Principal Component Analysis and Factor Analysis is that Factor Analysis tries to analyze only the variance that is shared between variables, and tries to exclude variance that is unique to each variable. This is reflected in the fact that, whereas the correlation matrix used in PCA has all 1's on the diagonal, the matrix used in Factor Analysis typically does not. The elements on the diagonal are called the communalities. Much of the difference in factor extraction methods consists in the way the communalities are estimated. The complement of the communality of a variable is its uniqueness
The objective is always to obtain factors that reproduce the observed correlations or covariances between the variables as closely as possible. The matrix of these correlations is the reconstructed correlation matrix.
The extraction options are enumerated by the FactorExtractionMethod enumeration type:
Value | Description |
---|---|
PrincipalComponents | Compute the principal components. This is equivalent to Principal Component Analysis. This method can be used with a covariance matrix. |
IterativePrincipalAxis | An iterative process that estimates the communalities using the Squared Multiple Correlation (SMC) as the starting point. This method can be used with a covariance matrix. |
UnweightedLeastSquares | A method that tries to minimize the sum of the squared differences between the correlation matrix and the reconstructed correlation matrix. |
GeneralizedLeastSquares | A method that tries to minimize the sum of the squared differences between the correlation matrix and the reconstructed correlation matrix weighted by the inverse of the variable's uniqueness. |
MaximumLikelihood | Maximum likelihood estimation. |
AlphaFactoring | A method that maximizes the alpha-reliability of the factors. |
ImageFactoring | A method where the common part of a variable is defined as its linear regression with respect to the other variables. This method can be used with a covariance matrix. |
Factor Rotation
After extraction, rotation is commonly used to maximize high correlations and minimize low correlations. This can be done in two ways. An orthogonal rotation maintains the property that factors are uncorrelated. An oblique rotation does not have this limitation, and can achieve more extreme correlations, but at a cost of increased complexity.
There are many rotation methods, and only the most common ones have been implemented. The available rotation methods are enumerated by the FactorRotationMethod enumeration type. Its members are listed below:
Value | Description |
---|---|
None | Don't rotate the factors. This is considered orthogonal. |
Varimax | The most common rotation orthogonal method. It maximizes the variance of factor loadings by increasing high factor loadings and lowering low ones. |
Equamax | An orthogonal rotation method that maximizes the variance of the loadings of each variable. |
Quartimax | An orthogonal rotation method that maximizes the variance of the loadings of both factors and variables. |
Promax | A popular oblique rotation method that first performs an orthogonal rotation and then uses powers of loadings to emphasize high and low values. |
The Promax rotation method requires one argument, the power that the loadings are to be raised to in the oblique phase of the method. This value is set through the PromaxPower property.
Running a factor analysis
Factor analysis is implemented by the FactorAnalysis class. This class has four constructors. The first constructor takes one argument: a Matrix<T> whose columns contain the data to be analyzed. The second constructor also takes one argument: an array of Vector<T> objects.
var matrix = Matrix.CreateRandom(100, 10);
var fa1 = new PrincipalComponentAnalysis(matrix);
var vectors = matrix.Columns.ToArray();
var fa2 = new PrincipalComponentAnalysis(vectors);
The third constructor takes two arguments. The first is a IDataFrame (a DataFrame<R, C> or Matrix<T>) that contains the variables that may be used in the analysis. The second argument is an array of strings that contains the names of the variables from the collection that should be included in the analysis. The following example loads some data from a Stata file, filters out missing values, and creates a factor analysis object:
var dataFrame = StataFile.ReadDataFrame(@"..\..\..\..\Data\m255.dta");
string[] names = { "item13", "item14", "item15", "item16",
"item17", "item18", "item19", "item20", "item21",
"item22", "item23", "item24" };
dataFrame = dataFrame.RemoveRowsWithMissingValues(names);
FactorAnalysis fa = new FactorAnalysis(dataFrame, names);
The fourth constructor is used to perform the factor analysis directly on a correlation or covariance matrix. It takes two arguments. The first is a SymmetricMatrix<T>. The second argument is a FactorMethod value that specifies whether the first argument is a correlation matrix or a covariance matrix.
Once the factor analysis object is created, the factor extraction and rotation methods should be specified. This is done through two properties. The ExtractionMethod property is a FactorExtractionMethod. The default is to use iterated principal axis extraction. The RotationMethod property is a FactorRotationMethod. The default is to use Varimax rotation.
The number of factors can be specified in a number of ways. The NumberOfFactors property can be set to the desired number. Alternatively, the FactorCountMethod property can be set, usually in combination with the FactorThreshold property. This value is of type FactorCountMethod and can have the following values:
Value | Description |
---|---|
Fixed | The number of factors is determined by the value of the NumberOfFactors property. |
Automatic | The number of factors equals the number of eigenvalues greater than a factor equal to the value of the FactorThreshold property. |
AutomaticRelativeToMean | he number of factors equals the number of eigenvalues greater than a factor equal to the value of the FactorThreshold property times the mean of the eigenvalues.. |
All | An orthogonal rotation method that maximizes the variance of the loadings of both factors and variables. |
The Compute method performs the actual calculations. The example below starts from the factor analysis object created earlier, selects the iterated principal axis method to extract 3 factors and Varimax rotation, and runs the analysis:
fa.NumberOfFactors = 3;
fa.ExtractionMethod = FactorExtractionMethod.IterativePrincipalAxis;
fa.RotationMethod = FactorRotationMethod.Varimax;
fa.Fit();
Results of the analysis
Once the computations are complete, a number of properties and methods give access to the results in detail. Which properties are available depends to some degree on whether the factor rotation is orthogonal or oblique.
The GetUnrotatedFactors and GetRotatedFactors return read-only collections of Factor objects that represent the factors before and after rotation, respectively. The properties of these factors can be inspected to get the details of each factor.
In addition, the FactorAnalysis object itself has a large number of vector and matrix properties. They are listed in the table below.
Property | Description |
---|---|
A vector containing the initial values of the communalities of the variables. | |
A vector containing the communalities of the extracted factors. | |
A matrix whose columns contain the factor loadings before rotation. | |
A matrix whose columns contain the factor loadings after rotation. For oblique rotations, this is the same as the | |
For oblique rotations, a matrix whose columns contain the factor pattern (the contribution of each factor to the variance of each variable). For orthogonal rotations this is the same as the RotatedLoadingsMatrix. | |
For oblique rotations, a matrix whose columns contain the factor structure (the correlation between each factor and each variable). For orthogonal rotations this is the same as the RotatedLoadingsMatrix. | |
The matrix that transforms the initial factors to the rotated factors. | |
A symmetric matrix of correlations between the factors. This is only meaningful for oblique rotations. For orthogonal rotations, this equals the identity matrix. | |
A matrix containing the coefficients of the factor scores. |