Multidimensional Optimization

Finding the minimum or maximum of a function in many variables is one of the most common problems in numerical computing. It is not surprising that a wide range of techniques exist, each with its advantages and disadvantages. Numerics.NET implements a large number of these techniques, to offer solutions for the broadest possible range of applications.

Available Algorithms

The Downhill Simplex Method of Nelder and Mead

The Nelder-Mead algorithm, often also called the downhill simplex method, is a simple algorithm that produces reasonable results when no derivatives are available. A simplex is a generalization of a triangle (2 dimensional) or tetrahedron (3 dimensional) to arbitrary dimensions. A simplex in n dimensions has n+1 vertices. This is the smallest number of points needed to define a region in n-dimensional space. The algorithm works by letting this simplex 'travel' downhill on the surface defined by the objective function through reflection. If no lower point can be found, the simplex is contracted towards the lowest point.

The algorithm defines its own versions of the ValueTest and SolutionTest properties. The value test succeeds if the difference of the objective function at the best and at the worst point in the simplex is less than the tolerance. The solution test succeeds if the distance between the best and the worst point in the simplex is less than the tolerance. The gradient test does not apply.

Convergence of the Nelder-Mead method is linear at best. It can fail in certain pathological situations. If derivatives are available, and the derivatives are continuous, then one of the other methods should be used.

Conjugate Gradient Optimizers

Two algorithms are available that search successively in a direction from a well-defined set of directions. For each direction, a one-dimensional search for an extremum is performed along the direction.

The conjugate gradient method, implemented by the ConjugateGradientOptimizer class, uses a series of search directions that are optimal for a quadratic objective function. The method requires that the gradient of the objective function be available. There are three common variants of the formula to generate a new search direction, giving rise to the Fletcher-Reeves and Polak-Ribière methods. The variant is selected by passing the appropriate ConjugateGradientMethod to the constructor, or setting the Method property. It is found that, in general, the Polak-Ribière method tends to converge somewhat faster. In addition, the positive variant of this method has proven convergence properties. It is the default method.

Powell's method, implemented by the PowellOptimizer class, is a variation of the conjugate gradient method that does not require knowledge of the gradient of the objective function. Instead, the directions are generated from the geometrical properties of the successive approximations. A drawback of Powell's method is that it usually requires more function evaluations, as the line searches need to be exact for the method to converge.

Quasi-Newton Methods

The full Newton method requires the calculation of second derivatives and the solution of a system of equations in every iteration. This can be very expensive and may not lead to great results, especially when the current point is far from the actual extremum.

Quasi-Newton methods use a numerical approximation to the inverse of the Hessian matrix that is maintained through each iteration. Different variations of the method use different ways to keep the approximation up to date. There are two main variations: the Davidon-Fletcher-Powell method (commonly abbreviated to DFP) and the Broyden-Fletcher-Goldfard-Shanno method (BFGS). The method is selected by passing the appropriate QuasiNewtonMethod to the constructor, or setting the Method property. The BFGS method is usually somewhat superior to the DFP method. It is the default.

The BFGS and DFP methods use a dense approximation to the inverse Hessian. As the number of variables grows larger, both the memory requirements and the computation time per iteration grow considerably. To address this, a limited memory variant of the BFGS method was developed. This is implemented by the LimitedMemoryBfgsOptimizer class. It uses only a small, fixed number of search directions to build the Hessian approximation. Between 3 and 7 is usually recommended. This number must be passed to the constructor. If not provided, a default value of 3 is used.

Defining the objective function

The function for which an extremum is to be found is called the objective function. It is accessed through the ObjectiveFunction property, which is a delegate of type Func<T, TResult>. The delegate takes a Vector<T> argument and returns a real number.

The following code demonstrates how to write methods to implement objective functions. The objective function implemented here is the Rosenbrock function,

f(x1,x2) = (1 - x1)2 + 105(x2-x12)2

which is a famous test function for optimization.

C#
double Rosenbrock(Vector<double> x)
{
    double a = 1 - x[0];
    double b = (x[1] - x[0] * x[0]);
    return a * a + 105 * b * b;
}

Some algorithms require the gradient (vector of partial derivatives) of the objective function. There are two ways to supply this function. The first is through the GradientFunction property. This property is of type Func<T, TResult>. It is a method that takes a Vector<T> argument and also returns a Vector<T>. The drawback of using this property is that a new Vector<T> instance is created on every call.

The following code demonstrates how to write a method that calculates the gradient of an objective function. Once again, we use the Rosenbrock function as our example:

C#
Vector<double> RosenbrockGradient(Vector<double> x)
{
    Vector<double> f = Vector.Create<double>(2);
    double a = 1 - x[0];
    double b = (x[1] - x[0] * x[0]);
    f[0] = -2 * a - 420 * x[0] * b;
    f[1] = 210 * b;
    return f;
}

Alternatively, the gradient can be specified using the FastGradientFunction property. This property is of type Func<T1, T2, TResult>. It is a method that takes two Vector<T> arguments. The first is the input vector. The second is the Vector<T> that is to hold the result. The return value is a reference to this result vector.

The following code demonstrates how to write a method that calculates the gradient of an objective function as a Func<T1, T2, TResult> delegate.

C#
Vector<double> RosenbrockFastGradient(Vector<double> x, Vector<double> f)
{
    if (f == null)
        f = Vector.Create<double>(2);
    double a = 1 - x[0];
    double b = (x[1] - x[0] * x[0]);
    f[0] = -2 * a - 420 * x[0] * b;
    f[1] = 210 * b;
    return f;
}

If the gradient function is not available, or if it is very expensive to calculate, it may be preferable to either approximate it numerically, or use a method that does not require the gradient function.

The next example puts this all together. It creates a BFGS optimizer and sets it up to minimize the Rosenbrock function:

C#
QuasiNewtonOptimizer bfgs =
    new QuasiNewtonOptimizer(QuasiNewtonMethod.Bfgs);
bfgs.ObjectiveFunction = Rosenbrock;
bfgs.FastGradientFunction = RosenbrockFastGradient;

When using .NET 4.0, it is possible to have the gradient computed using Automatic Differentiation. To do so, specify the objective function by setting the SymbolicObjectiveFunction property to a lambda expression that represents the objective function. The lambda expression should take a Vector<T> as its input.

Termination Criteria

Optimization algorithms in multiple dimensions commonly have two or three criteria that may each signal that an extremum has been found.

The ValueTest property returns a SimpleConvergenceTest object that represents the convergence test based on the value of the objective function. The test is successful if the change in the value of the objective function is less than the tolerance. The test returns a value of Divergent if the value of the objective function is infinite, and BadFunction when the value is NaN.

There is some danger in using this test, since some algorithms may not return a new best estimate for the extremum on each iteration. For this reason, the test is not active by default. To make it active, its Enabled property must be set to true.

The SolutionTest property returns a VectorConvergenceTest<T> object that represents the convergence test based on the estimated extremum. The test is successful if the change in the approximate extremum is less than the tolerance. The test returns a value of Divergent if one of the elements of the solution is infinite, and BadFunction when one of the values is NaN.

The VectorConvergenceTest has many options to specify the details of the convergence test. By default, the error is calculated as the maximum of the errors in each of the elements of the change of the extremum from the previous iteration compared to the corresponding elements of the extremum.

Methods that use gradients have a third test available. The GradientTest property returns a VectorConvergenceTest<T> based on the gradient of the objective function. The test succeeds if the norm of the gradient vector is less than the tolerance. This test is inactive for algorithms that do not use gradients.

Specific algorithms may supply their own convergence tests, or may modify the exact meaning of the tests discussed here. For example, the Nelder-Mead method uses slight variations of the value and solution tests that are more convenient for that particular algorithm.

The example below illustrates some of the options mentioned here. It sets the tolerance of the gradient test to 100 times the machine precision, which will succeed if the sum of the absolute values is less than the tolerance. It also explicitly specifies that the value test is not to be used.

C#
bfgs.ValueTest.Enabled = false;
bfgs.GradientTest.Norm = VectorConvergenceNorm.SumOfAbsoluteValues;
bfgs.GradientTest.Tolerance = 100 * MachineConstants.Epsilon;

Line searches

Many algorithms for Multidimensional optimization work by successively solving a one-dimensional optimization problem along a search direction, and altering the search direction in each iteration. These algorithms all inherit from DirectionalOptimizer. Although general one-dimensional optimization algorithms may be used, they usually don't work well. They are either wasteful because they are too precise, or the results fail to meet certain criteria that are required to ensure convergence of the algorithm.

Specialized line search algorithms are therefore in order. Numerics.NET provides three of them:

Class

Description

BacktrackingLineSearch

A line search that tries a full step first and back-tracks as needed.

MoreThuenteLineSearch

The classic algorithm of More and Thuente.

ParabolicLineSearch

A line search that uses quadratic interpolation.

UnitLineSearch

A line "search" that always returns a unit step.

All optimization algorithms that use line searches automatically select the appropriate line search algorithm, and set the search parameters to appropriate values. The unit line search is useful for experimental purposes, or when the objective function is known to be well behaved so that a line search is not necessary. You can use the LineSearch property to access the line search object.

Computing the Extremum

You must specify an initial guess for the solution by setting the InitialGuess property to a suitable Vector<T> value. Any vector type may be used.

The FindExtremum method finds the extremum. It returns a DenseVector<T> that contains the best approximation to the extremum. This value is also available through the Extremum

C#
bfgs.InitialGuess = Vector.Create(-1.2, 1.0);
bfgs.ExtremumType = ExtremumType.Minimum;
bfgs.FindExtremum();

Interpreting Results

After the FindExtremum method returns, a number of properties give more details about the algorithm. The Status property returns an AlgorithmStatus value that indicates how the algorithm terminated. The possible values are listed below:

AlgorithmStatus values

Value

Description

NoResult

The algorithm has not been executed.

Busy

The algorithm is still executing.

Converged

The algorithm ended normally. The desired accuracy has been achieved.

IterationLimitExceeded

The number of iterations needed to achieve the desired accuracy is greater than MaxIterations.

EvaluationLimitExceeded

The number of function evaluations needed to achieve the desired accuracy is greater than MaxEvaluations.

RoundOffError

Round-off prevented the algorithm from achieving the desired accuracy.

BadFunction

Bad behavior of the target function prevented the algorithm from achieving the desired accuracy.

Divergent

The algorithm diverges. The objective function appears to be unbounded.

Each of the ConvergenceTest objects discussed earlier exposes an Error property that returns the estimated error used in the test. The IterationsNeeded and EvaluationsNeeded properties return the number of iterations and the number of evaluations of the objective function, respectively. Algorithms that use gradients also have a GradientEvaluationsNeeded.

The final example prints a summary of the results of running the BFGS algorithm on the Rosenbrock function:

C#
Console.WriteLine("BFGS Method:");
Console.WriteLine("  Solution: {0}", bfgs.Extremum);
Console.WriteLine("  Estimated error: {0}", bfgs.SolutionTest.Error);
Console.WriteLine("  Gradient residual: {0}", bfgs.GradientTest.Error);
Console.WriteLine("  # iterations: {0}", bfgs.IterationsNeeded);
Console.WriteLine("  # function evaluations: {0}",
    bfgs.EvaluationsNeeded);
Console.WriteLine("  # gradient evaluations: {0}",
    bfgs.GradientEvaluationsNeeded);

References

Fletcher, R. Practical Methods of Optimization, John Wiley & Sons, New York, 1987.

Fletcher, R. and Reeves, C.M., Function Minimization by Conjugate Gradients, Comp, J. 7, 149-154, 1964.

Moré and D. Thuente, Line Search Algorithms with Guaranteed Sufficient Decrease, ACM Transactions on Mathematical Software , Vol. 20, No. 3, pp. 286--307, 1994.

Nelder, J. A. and Mead, R. "A Simplex Method for Function Minimization." Comput. J. 7, 308-313, 1965.

Polak, E., and Ribière, G., Note sur la Convergence de Methodes de Directions Conjugees, Revue Francaise d'Informatique et de Recherche Operationnelle, 3, 35-43, 1969.

Powell, M.J.D., An Efficient Method for Finding the Minimum of a Function of Several Variables Without Calculating Derivatives, Comp. J. 7, 155-162, 1964.