Minimizing Sums of Squares
Many optimization problems take the form of minimizing a sum of squares of a set of functions. Specialized algorithms have been developed to take advantage of the structure of such problems.
Available Algorithms
The Levenberg-Marquardt algorithm
The classic algorithm for solving nonlinear least squares was first developed by Kenneth Levenberg and Donald Marquardt, and put into its current form by Jorge Moré. It is a trust region method. A quadratic approximation to the objective function is used in a region around the current point called the trust region. If the approximation is good, then the trust region is expanded; otherwise it is contracted.
Trust Region Reflexive Method
The trust region reflexive algorithm is an enhancement of the Levenberg-Marquardt. The basic idea is to adjust the shape of the trust region so it fits within the bounds. Furthermore, when the proposed step is out of bounds, a search direction reflected from the bound is considered. Because of these enhancements, this algorithm tends to work somewhat better than the Levenberg-Marquardt algorithm for constrained problems.
Defining the objective function
The function for which an extremum is to be found is called the objective function. In this case, we are minimizing the sum of the squares of a set of functions. Or, equivalently, we are minimizing the norm of a vector function. This vector function is the objective function for least squares optimizers.
The function is accessed through the ObjectiveFunction property, which is a delegate of type Func<T1, T2, TResult>. The first argument is a Vector<T> that contains the unknowns. The second argument is a vector that is to hold the result. This is also the return value, unless the second argument is null, in which case a new vector should be created and returned.
The following code demonstrates how to write methods to implement objective functions. The Rosenbrock function,
f(x1,x2) = (1 - x1)2 + 105(x2-x12)2,
is a famous test function for optimization. It is the sum of the squares of two functions, and so can be minimized using a least squares optimizer. The objective function would be written as:
Vector<double> Rosenbrock(Vector<double> x, Vector<double> f)
{
if (f == null)
f = Vector.Create<double>(2);
f[0] = 1 - x[0];
f[1] = Math.Sqrt(105) * (x[1] - x[0] * x[0]);
return f;
}
The least squares optimizers also require the Jacobian of the objective function. This is a matrix whose rows contain the gradient (vector of partial derivatives) of each corresponding component of the objective function. The JacobianFunction property is a Func<T1, T2, TResult> delegate, with a definition similar to the objective function. The first argument is a Vector<T>. The second argument is a Matrix<T> that is to hold the result. The return value is also a reference to this matrix.
The following code demonstrates how to write a method that calculates the Jacobian of an objective function. Once again, we use the Rosenbrock function as our example:
Matrix<double> RosenbrockJacobian(Vector<double> x, Matrix<double> J)
{
if (J == null) J = Matrix.Create<double>(2, 2);
J[0, 0] = -1;
J[0, 1] = 0;
J[1, 0] = Math.Sqrt(105);
J[1, 1] = -2 * Math.Sqrt(105) * x[0];
return J;
}
If the Jacobian is not available, or if it is very expensive to calculate, it may be preferable to either approximate it numerically, or use a method that does not require the gradient function.
When using .NET 4.0, it is possible to have the Jacobian computed using Automatic Differentiation. To do so, specify the objective function by calling the SetSymbolicObjectiveFunction(IList<Expression<Func<Vector<Double>, Double>>>) method. The argument can be a list of lambda expressions that map a Vector<T> to its value, or a parameter array of such expressions.
The next example puts this all together. It creates a Levenberg-Marquardt optimizer and sets it up to minimize the Rosenbrock function:
LeastSquaresOptimizer lm = new LevenbergMarquardtOptimizer();
lm.ObjectiveFunction = Rosenbrock;
lm.JacobianFunction = RosenbrockJacobian;
Running the optimization
Once the algorithm is set up, the minimum can be calculated. There are two more issues that must be dealt with.
Defining the initial guess
In all cases, a starting point for the algorithm is required. The InitialGuess property must be set to a suitable value. Once this is done, the FindMinimum() method ca be called to perform the minimization:
lm.InitialGuess = Vector.Create(-1.2, 1.0);
lm.FindMinimum();
Defining bounds on the variables
LowerBounds and UpperBounds properties let you set bounds on the variables. These are vectors that contain the bounds for each variable. If a variable is unbounded, then a value of NegativeInfinity should be used for the lower bound, and PositiveInfinity for the upper bound.
The example below sets up a Trust Region Reflective optimizer for the Rosenbrock function, and sets bounds on the variables:
LeastSquaresOptimizer trr = new TrustRegionReflectiveOptimizer();
trr.ObjectiveFunction = Rosenbrock;
trr.JacobianFunction = RosenbrockJacobian;
trr.InitialGuess = Vector.Create(-1.2, 1.0);
trr.LowerBounds = Vector.Create(-2.0, -2.0);
trr.UpperBounds = Vector.Create(0.5, 2.0);
trr.FindMinimum();
Termination Criteria
Optimization algorithms in multiple dimensions commonly have two or three criteria that may each signal that an extremum has been found.
The ValueTest property returns a SimpleConvergenceTest object that represents the convergence test based on the value of the objective function. The test is successful if the change in the value of the objective function is less than the tolerance. The test returns a value of Divergent if the value of the objective function is infinite, and BadFunction when the value is NaN. There is some danger in using this test, since some algorithms may not return a new best estimate for the extremum on each iteration. For this reason, the test is not active by default. To make it active, its Enabled property must be set to true.
The SolutionTest property returns a VectorConvergenceTest<T> object that represents the convergence test based on the estimated extremum. The test is successful if the change in the approximate extremum is less than the tolerance. The test returns a value of Divergent if one of the elements of the solution is infinite, and BadFunction when one of the values is NaN.
The VectorConvergenceTest has many options to specify the details of the convergence test. By default, the error is calculated as the maximum of the errors in each of the elements of the change of the extremum from the previous iteration compared to the corresponding elements of the extremum.
The GradientTest property returns a VectorConvergenceTest<T> based on the gradient of the objective function. The test succeeds if the norm of the gradient vector is less than the tolerance. This test is inactive for algorithms that do not use gradients.
Interpreting Results
After the FindMinimum method returns, a number of properties give more details about the algorithm. The Status property returns an AlgorithmStatus value that indicates how the algorithm terminated. The possible values are listed below:
Value | Description |
---|---|
NoResult | The algorithm has not been executed. |
Busy | The algorithm is still executing. |
Converged | The algorithm ended normally. The desired accuracy has been achieved. |
IterationLimitExceeded | The number of iterations needed to achieve the desired accuracy is greater than MaxIterations. |
EvaluationLimitExceeded | The number of function evaluations needed to achieve the desired accuracy is greater than MaxEvaluations. |
RoundOffError | Round-off prevented the algorithm from achieving the desired accuracy. |
BadFunction | Bad behavior of the target function prevented the algorithm from achieving the desired accuracy. |
Divergent | The algorithm diverges. The objective function appears to be unbounded. |
Each of the ConvergeTest objects discussed earlier exposes an Error property that returns the estimated error used in the test. The IterationsNeeded and EvaluationsNeeded properties return the number of iterations and the number of evaluations of the objective function, respectively. The JacobianEvaluationsNeeded property counts the number of times the full Jacobian was evaluated.
The final example prints a summary of the results of running the Levenberg-Marquardt algorithm on the Rosenbrock function:
Console.WriteLine("Levenberg-Marquardt Method:");
Console.WriteLine(" Solution: {0}", lm.Minimum);
Console.WriteLine(" Estimated error: {0}", lm.SolutionTest.Error);
Console.WriteLine(" Gradient residual: {0}", lm.GradientTest.Error);
Console.WriteLine(" # iterations: {0}", lm.IterationsNeeded);
Console.WriteLine(" # function evaluations: {0}",
lm.EvaluationsNeeded);
Console.WriteLine(" # Jacobian evaluations: {0}",
lm.JacobianEvaluationsNeeded);
References
Branch, M.A., T.F. Coleman, and Y. Li, A Subspace, Interior, and Conjugate Gradient Method for Large-Scale Bound-Constrained Minimization Problems, SIAM Journal on Scientific Computing, Vol. 21, Number 1, pp 1-23, 1999.
Levenberg, K. A method for the solution of certain nonlinear problems in least squares. Quart. Appl. Math. 2, 164-168, 1944.
Marquardt, D. W. An algorithm for least squares estimation of nonlinear parameters, SIAM J. Appl. Math. 11, 431-441, 1963.
Moré, J. J., The Levenberg-Marquardt Algorithm: Implementation and Theory, Numerical Analysis 630, pp 105-116, 1978.