| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This chapter describes routines for finding minima of arbitrary multidimensional functions. The library provides low level components for a variety of iterative minimizers and convergence tests. These can be combined by the user to achieve the desired solution, while providing full access to the intermediate steps of the algorithms. Each class of methods uses the same framework, so that you can switch between minimizers at runtime without needing to recompile your program. Each instance of a minimizer keeps track of its own state, allowing the minimizers to be used in multi-threaded programs. The minimization algorithms can be used to maximize a function by inverting its sign.
The header file ‘gsl_multimin.h’ contains prototypes for the minimization functions and related declarations.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The problem of multidimensional minimization requires finding a point
such that the scalar function,
takes a value which is lower than at any neighboring point. For smooth
functions the gradient
vanishes at the minimum. In
general there are no bracketing methods available for the
minimization of
-dimensional functions. The algorithms
proceed from an initial guess using a search algorithm which attempts
to move in a downhill direction.
Algorithms making use of the gradient of the function perform a
one-dimensional line minimisation along this direction until the lowest
point is found to a suitable tolerance. The search direction is then
updated with local information from the function and its derivatives,
and the whole process repeated until the true
-dimensional
minimum is found.
The Nelder-Mead Simplex algorithm applies a different strategy. It
maintains
trial parameter vectors as the vertices of a
-dimensional simplex. In each iteration step it tries to
improve the worst vertex by a simple geometrical transformation until
the size of the simplex falls below a given tolerance.
Both types of algorithms use a standard framework. The user provides a high-level driver for the algorithms, and the library provides the individual functions necessary for each of the steps. There are three main phases of the iteration. The steps are,
Each iteration step consists either of an improvement to the
line-minimisation in the current direction or an update to the search
direction itself. The state for the minimizers is held in a
gsl_multimin_fdfminimizer struct or a
gsl_multimin_fminimizer struct.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Note that the minimization algorithms can only search for one local minimum at a time. When there are several local minima in the search area, the first minimum to be found will be returned; however it is difficult to predict which of the minima this will be. In most cases, no error will be reported if you try to find a local minimum in an area where there is more than one.
It is also important to note that the minimization algorithms find local minima; there is no way to determine whether a minimum is a global minimum of the function in question.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following function initializes a multidimensional minimizer. The minimizer itself depends only on the dimension of the problem and the algorithm and can be reused for different problems.
This function returns a pointer to a newly allocated instance of a
minimizer of type T for an n-dimension function. If there
is insufficient memory to create the minimizer then the function returns
a null pointer and the error handler is invoked with an error code of
GSL_ENOMEM.
This function initializes the minimizer s to minimize the function
fdf starting from the initial point x. The size of the
first trial step is given by step_size. The accuracy of the line
minimization is specified by tol. The precise meaning of this
parameter depends on the method used. Typically the line minimization
is considered successful if the gradient of the function
is
orthogonal to the current search direction
to a relative
accuracy of tol, where
. A tol value of 0.1 is
suitable for most purposes, since line minimization only needs to
be carried out approximately. Note that setting tol to zero will
force the use of “exact” line-searches, which are extremely expensive.
This function initializes the minimizer s to minimize the function f, starting from the initial point x. The size of the initial trial steps is given in vector step_size. The precise meaning of this parameter depends on the method used.
This function frees all the memory associated with the minimizer s.
This function returns a pointer to the name of the minimizer. For example,
printf ("s is a '%s' minimizer\n",
gsl_multimin_fdfminimizer_name (s));
|
would print something like s is a 'conjugate_pr' minimizer.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
You must provide a parametric function of
variables for the
minimizers to operate on. You may also need to provide a routine which
calculates the gradient of the function and a third routine which
calculates both the function value and the gradient together. In order
to allow for general parameters the functions are defined by the
following data types:
This data type defines a general function of
variables with
parameters and the corresponding gradient vector of derivatives,
double (* f) (const gsl_vector * x, void * params)this function should return the result
for argument x and parameters params.
void (* df) (const gsl_vector * x, void * params, gsl_vector * g)this function should store the n-dimensional gradient
in the vector g for argument x
and parameters params, returning an appropriate error code if the
function cannot be computed.
void (* fdf) (const gsl_vector * x, void * params, double * f, gsl_vector * g)This function should set the values of the f and g as above,
for arguments x and parameters params. This function
provides an optimization of the separate functions for
and
—it is always faster to compute the function and its
derivative at the same time.
size_t nthe dimension of the system, i.e. the number of components of the vectors x.
void * paramsa pointer to the parameters of the function.
This data type defines a general function of
variables with
parameters,
double (* f) (const gsl_vector * x, void * params)this function should return the result
for argument x and parameters params.
size_t nthe dimension of the system, i.e. the number of components of the vectors x.
void * paramsa pointer to the parameters of the function.
The following example function defines a simple paraboloid with two parameters,
/* Paraboloid centered on (dp[0],dp[1]) */
double
my_f (const gsl_vector *v, void *params)
{
double x, y;
double *dp = (double *)params;
x = gsl_vector_get(v, 0);
y = gsl_vector_get(v, 1);
return 10.0 * (x - dp[0]) * (x - dp[0]) +
20.0 * (y - dp[1]) * (y - dp[1]) + 30.0;
}
/* The gradient of f, df = (df/dx, df/dy). */
void
my_df (const gsl_vector *v, void *params,
gsl_vector *df)
{
double x, y;
double *dp = (double *)params;
x = gsl_vector_get(v, 0);
y = gsl_vector_get(v, 1);
gsl_vector_set(df, 0, 20.0 * (x - dp[0]));
gsl_vector_set(df, 1, 40.0 * (y - dp[1]));
}
/* Compute both f and df together. */
void
my_fdf (const gsl_vector *x, void *params,
double *f, gsl_vector *df)
{
*f = my_f(x, params);
my_df(x, params, df);
}
|
The function can be initialized using the following code,
gsl_multimin_function_fdf my_func;
double p[2] = { 1.0, 2.0 }; /* center at (1,2) */
my_func.f = &my_f;
my_func.df = &my_df;
my_func.fdf = &my_fdf;
my_func.n = 2;
my_func.params = (void *)p;
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following function drives the iteration of each algorithm. The function performs one iteration to update the state of the minimizer. The same function works for all minimizers so that different methods can be substituted at runtime without modifications to the code.
These functions perform a single iteration of the minimizer s. If the iteration encounters an unexpected problem then an error code will be returned.
The minimizer maintains a current best estimate of the minimum at all times. This information can be accessed with the following auxiliary functions,
These functions return the current best estimate of the location of the minimum, the value of the function at that point, its gradient, and minimizer specific characteristic size for the minimizer s.
This function resets the minimizer s to use the current point as a new starting point.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A minimization procedure should stop when one of the following conditions is true:
The handling of these conditions is under user control. The functions below allow the user to test the precision of the current result.
This function tests the norm of the gradient g against the
absolute tolerance epsabs. The gradient of a multidimensional
function goes to zero at a minimum. The test returns GSL_SUCCESS
if the following condition is achieved,
and returns GSL_CONTINUE otherwise. A suitable choice of
epsabs can be made from the desired accuracy in the function for
small variations in
. The relationship between these quantities
is given by
.
This function tests the minimizer specific characteristic
size (if applicable to the used minimizer) against absolute tolerance epsabs.
The test returns GSL_SUCCESS if the size is smaller than tolerance,
otherwise GSL_CONTINUE is returned.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are several minimization methods available. The best choice of algorithm depends on the problem. All of the algorithms use the value of the function and its gradient at each evaluation point, except for the Simplex algorithm which uses function values only.
This is the Fletcher-Reeves conjugate gradient algorithm. The conjugate gradient algorithm proceeds as a succession of line minimizations. The sequence of search directions is used to build up an approximation to the curvature of the function in the neighborhood of the minimum.
An initial search direction p is chosen using the gradient, and line
minimization is carried out in that direction. The accuracy of the line
minimization is specified by the parameter tol. The minimum
along this line occurs when the function gradient g and the search direction
p are orthogonal. The line minimization terminates when
. The
search direction is updated using the Fletcher-Reeves formula
where
, and
the line minimization is then repeated for the new search
direction.
This is the Polak-Ribiere conjugate gradient algorithm. It is similar
to the Fletcher-Reeves method, differing only in the choice of the
coefficient
. Both methods work well when the evaluation
point is close enough to the minimum of the objective function that it
is well approximated by a quadratic hypersurface.
These methods use the vector Broyden-Fletcher-Goldfarb-Shanno (BFGS)
algorithm. This is a quasi-Newton method which builds up an approximation
to the second derivatives of the function
using the difference
between successive gradient vectors. By combining the first and second
derivatives the algorithm is able to take Newton-type steps towards the
function minimum, assuming quadratic behavior in that region.
The bfgs2 version of this minimizer is the most efficient
version available, and is a faithful implementation of the line
minimization scheme described in Fletcher's Practical Methods of
Optimization, Algorithms 2.6.2 and 2.6.4. It supercedes the original
bfgs routine and requires substantially fewer function and
gradient evaluations. The user-supplied tolerance tol
corresponds to the parameter
used by Fletcher. A value
of 0.1 is recommended for typical use (larger values correspond to
less accurate line searches).
The steepest descent algorithm follows the downhill gradient of the function at each step. When a downhill step is successful the step-size is increased by a factor of two. If the downhill step leads to a higher function value then the algorithm backtracks and the step size is decreased using the parameter tol. A suitable value of tol for most applications is 0.1. The steepest descent method is inefficient and is included only for demonstration purposes.
This is the Simplex algorithm of Nelder and Mead. It constructs
vectors
from the
starting vector x and the vector step_size as follows:
These vectors form the
vertices of a simplex in
dimensions. On each iteration the algorithm tries to improve
the parameter vector
corresponding to the highest
function value by simple geometrical transformations. These
are reflection, reflection followed by expansion, contraction and multiple
contraction. Using these transformations the simplex moves through
the parameter space towards the minimum, where it contracts itself.
After each iteration, the best vertex is returned. Note, that due to the nature of the algorithm not every step improves the current best parameter vector. Usually several iterations are required.
The routine calculates the minimizer specific characteristic size as the
average distance from the geometrical center of the simplex to all its
vertices. This size can be used as a stopping criteria, as the simplex
contracts itself near the minimum. The size is returned by the function
gsl_multimin_fminimizer_size.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This example program finds the minimum of the paraboloid function
defined earlier. The location of the minimum is offset from the origin
in
and
, and the function value at the minimum is
non-zero. The main program is given below, it requires the example
function given earlier in this chapter.
int
main (void)
{
size_t iter = 0;
int status;
const gsl_multimin_fdfminimizer_type *T;
gsl_multimin_fdfminimizer *s;
/* Position of the minimum (1,2). */
double par[2] = { 1.0, 2.0 };
gsl_vector *x;
gsl_multimin_function_fdf my_func;
my_func.f = &my_f;
my_func.df = &my_df;
my_func.fdf = &my_fdf;
my_func.n = 2;
my_func.params = ∥
/* Starting point, x = (5,7) */
x = gsl_vector_alloc (2);
gsl_vector_set (x, 0, 5.0);
gsl_vector_set (x, 1, 7.0);
T = gsl_multimin_fdfminimizer_conjugate_fr;
s = gsl_multimin_fdfminimizer_alloc (T, 2);
gsl_multimin_fdfminimizer_set (s, &my_func, x, 0.01, 1e-4);
do
{
iter++;
status = gsl_multimin_fdfminimizer_iterate (s);
if (status)
break;
status = gsl_multimin_test_gradient (s->gradient, 1e-3);
if (status == GSL_SUCCESS)
printf ("Minimum found at:\n");
printf ("%5d %.5f %.5f %10.5f\n", iter,
gsl_vector_get (s->x, 0),
gsl_vector_get (s->x, 1),
s->f);
}
while (status == GSL_CONTINUE && iter < 100);
gsl_multimin_fdfminimizer_free (s);
gsl_vector_free (x);
return 0;
}
|
The initial step-size is chosen as 0.01, a conservative estimate in this case, and the line minimization parameter is set at 0.0001. The program terminates when the norm of the gradient has been reduced below 0.001. The output of the program is shown below,
x y f
1 4.99629 6.99072 687.84780
2 4.98886 6.97215 683.55456
3 4.97400 6.93501 675.01278
4 4.94429 6.86073 658.10798
5 4.88487 6.71217 625.01340
6 4.76602 6.41506 561.68440
7 4.52833 5.82083 446.46694
8 4.05295 4.63238 261.79422
9 3.10219 2.25548 75.49762
10 2.85185 1.62963 67.03704
11 2.19088 1.76182 45.31640
12 0.86892 2.02622 30.18555
Minimum found at:
13 1.00000 2.00000 30.00000
|
Note that the algorithm gradually increases the step size as it successfully moves downhill, as can be seen by plotting the successive points.
The conjugate gradient algorithm finds the minimum on its second direction because the function is purely quadratic. Additional iterations would be needed for a more complicated function.
Here is another example using the Nelder-Mead Simplex algorithm to minimize the same example object function, as above.
int
main(void)
{
size_t np = 2;
double par[2] = {1.0, 2.0};
const gsl_multimin_fminimizer_type *T =
gsl_multimin_fminimizer_nmsimplex;
gsl_multimin_fminimizer *s = NULL;
gsl_vector *ss, *x;
gsl_multimin_function minex_func;
size_t iter = 0, i;
int status;
double size;
/* Initial vertex size vector */
ss = gsl_vector_alloc (np);
/* Set all step sizes to 1 */
gsl_vector_set_all (ss, 1.0);
/* Starting point */
x = gsl_vector_alloc (np);
gsl_vector_set (x, 0, 5.0);
gsl_vector_set (x, 1, 7.0);
/* Initialize method and iterate */
minex_func.f = &my_f;
minex_func.n = np;
minex_func.params = (void *)∥
s = gsl_multimin_fminimizer_alloc (T, np);
gsl_multimin_fminimizer_set (s, &minex_func, x, ss);
do
{
iter++;
status = gsl_multimin_fminimizer_iterate(s);
if (status)
break;
size = gsl_multimin_fminimizer_size (s);
status = gsl_multimin_test_size (size, 1e-2);
if (status == GSL_SUCCESS)
{
printf ("converged to minimum at\n");
}
printf ("%5d ", iter);
for (i = 0; i < np; i++)
{
printf ("%10.3e ", gsl_vector_get (s->x, i));
}
printf ("f() = %7.3f size = %.3f\n", s->fval, size);
}
while (status == GSL_CONTINUE && iter < 100);
gsl_vector_free(x);
gsl_vector_free(ss);
gsl_multimin_fminimizer_free (s);
return status;
}
|
The minimum search stops when the Simplex size drops to 0.01. The output is shown below.
1 6.500e+00 5.000e+00 f() = 512.500 size = 1.082
2 5.250e+00 4.000e+00 f() = 290.625 size = 1.372
3 5.250e+00 4.000e+00 f() = 290.625 size = 1.372
4 5.500e+00 1.000e+00 f() = 252.500 size = 1.372
5 2.625e+00 3.500e+00 f() = 101.406 size = 1.823
6 3.469e+00 1.375e+00 f() = 98.760 size = 1.526
7 1.820e+00 3.156e+00 f() = 63.467 size = 1.105
8 1.820e+00 3.156e+00 f() = 63.467 size = 1.105
9 1.016e+00 2.812e+00 f() = 43.206 size = 1.105
10 2.041e+00 2.008e+00 f() = 40.838 size = 0.645
11 1.236e+00 1.664e+00 f() = 32.816 size = 0.645
12 1.236e+00 1.664e+00 f() = 32.816 size = 0.447
13 5.225e-01 1.980e+00 f() = 32.288 size = 0.447
14 1.103e+00 2.073e+00 f() = 30.214 size = 0.345
15 1.103e+00 2.073e+00 f() = 30.214 size = 0.264
16 1.103e+00 2.073e+00 f() = 30.214 size = 0.160
17 9.864e-01 1.934e+00 f() = 30.090 size = 0.132
18 9.190e-01 1.987e+00 f() = 30.069 size = 0.092
19 1.028e+00 2.017e+00 f() = 30.013 size = 0.056
20 1.028e+00 2.017e+00 f() = 30.013 size = 0.046
21 1.028e+00 2.017e+00 f() = 30.013 size = 0.033
22 9.874e-01 1.985e+00 f() = 30.006 size = 0.028
23 9.846e-01 1.995e+00 f() = 30.003 size = 0.023
24 1.007e+00 2.003e+00 f() = 30.001 size = 0.012
converged to minimum at
25 1.007e+00 2.003e+00 f() = 30.001 size = 0.010
|
The simplex size first increases, while the simplex moves towards the minimum. After a while the size begins to decrease as the simplex contracts around the minimum.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The conjugate gradient and BFGS methods are described in detail in the following book,
A brief description of multidimensional minimization algorithms and more recent references can be found in,
The simplex algorithm is described in the following paper,
| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Autobuild on September, 26 2007 using texi2html 1.78.