An overview of gradient methods in problems of mathematical optimization. Gradient unconstrained optimization techniques

There are no restrictions in the unconstrained optimization problem.

Recall that the gradient of a multidimensional function is a vector that is analytically expressed by the geometric sum of partial derivatives

Scalar function gradient F(X) at some point is directed towards the steepest increase in the function and is orthogonal to the level line (surface of constant value F(X), passing through the point X k). The vector opposite to the gradient  antigradient  is directed towards the steepest decrease of the function F(X). At the extreme point grad F(X)= 0.

In gradient methods, the motion of a point when searching for the minimum of the objective function is described by the iterative formula

where k  step parameter on k-th iteration along the antigradient. For climbing methods (finding the maximum), you need to move along the gradient.

Different variants of gradient methods differ from each other in the way of choosing the step parameter, as well as taking into account the direction of movement in the previous step. Consider the following options gradient methods: with a constant step, with a variable step parameter (step division), the steepest descent method and the conjugate gradient method.

Method with constant step parameter. In this method, the step parameter is constant at each iteration. The question arises: how to practically choose the size of the step parameter? A sufficiently small step parameter can lead to an unacceptably large number of iterations required to reach the minimum point. On the other hand, too large a step parameter can lead to overshooting the minimum point and to an oscillatory computational process around this point. These circumstances are disadvantages of the method. Since it is impossible to guess in advance the acceptable value of the step parameter k, then it becomes necessary to use the gradient method with a variable step parameter.

As it approaches the optimum, the gradient vector decreases in magnitude, tending to zero, therefore, at k = const step length gradually decreases. Near the optimum, the length of the gradient vector tends to zero. Vector length or norm in n-dimensional Euclidean space is determined by the formula

, where n number of variables.

Options for stopping the process of finding the optimum:


From a practical point of view, it is more convenient to use the 3rd stopping criterion (since the values ​​of the design parameters are of interest), however, to determine the proximity of the extremum point, you need to focus on the 2nd criterion. Several criteria can be used to stop the computing process.

Let's look at an example. Find the minimum of the objective function F(X) = (x 1  2) 2 + (x 2  4) 2 . Exact solution of the problem X * = (2.0; 4.0). Partial Expressions

,
.

Choosing a step k = 0.1. Let's search from the starting point X 1 = . The solution is presented in the form of a table.

Gradient method with step parameter subdivision. In this case, in the process of optimization, the parameter of the step  k decreases if, after the next step, the objective function increases (when searching for the minimum). In this case, the length of the step is often split (divided) in half, and the step is repeated from the previous point. This provides a more accurate approach to the extreme point.

The steepest descent method. Variable step methods are more economical in terms of the number of iterations. If optimal length step  k along the direction of the antigradient is a solution to the one-dimensional minimization problem, then this method is called the steepest descent method. This method solves the one-dimensional minimization problem at each iteration:

F (X k + 1 ) = F (X k k S k ) = min F ( k ), S k = F (X);

k >0

.

V this method movement in the direction of the antigradient continues until the minimum of the objective function is reached (while the value of the objective function decreases). Using an example, let us consider how the objective function can be written analytically at each step depending on the unknown parameter

Example. min F(x 1 , x 2 ) = 2x 1 2 + 4x 2 3 3. Then F(X)= [ 4x 1 ; 12x 2 2 ]. Let the point X k = , hence F(X)= [ 8; 12], F(X k S k ) =

2(2  8) 2 + 4(1  12) 3  3. It is necessary to find  that delivers the minimum of the given function.

Steepest Descent Algorithm (for Finding the Minimum)

Initial step. Let   be the stopping constant. Select starting point X 1 , put k = 1 and go to the main step.

The main step. If || gradF(X)||< , then end the search, otherwise define F(X k ) and find k optimal solution minimization problems F(X k k S k ) at k 0. Put X k +1 = X k k S k, assign k =

k + 1 and repeat the main step.

To find the minimum of a function of one variable in the steepest descent method, you can use unimodal optimization methods. From a large group of methods, we will consider the method of dichotomy (bisection) and the golden ratio. The essence of unimodal optimization methods is to narrow the uncertainty interval for the location of the extremum.

Dichotomy (bisection) methodInitial step. The distinguishability constant  and the finite length of the uncertainty interval are chosen l... The quantity  should be as small as possible, however, allowing one to distinguish between the values ​​of the function F() and F() ... Let be [ a 1 , b 1 ]  initial uncertainty interval. Put k =

The main stage consists of a finite number of iterations of the same type.

kth iteration.

Step 1. If b k a k l, then the computation ends. Solution x * = (a k + b k )/2. Otherwise

,
.

Step 2. If F( k ) < F( k ), put a k +1 = a k ; b k +1 = k... Otherwise a k +1 = k and b k +1 = b k... Assign k = k + 1 and go to step 1.

Golden section method. More effective method than the dichotomy method. Lets get set value uncertainty interval in fewer iterations and requires fewer calculations of the objective function. In this method, the new division point of the uncertainty interval is calculated once. A new point is placed at a distance

 = 0.618034 from the end of the interval.

Algorithm of the method of the golden ratio

Initial step. Select the allowable finite length of the uncertainty interval l > 0. Let be [ a 1 , b 1 ]  initial uncertainty interval. Put 1 = a 1 +(1 )(b 1 a 1 ) and 1 = a 1 + (b 1 a 1 ) , where = 0,618 ... Calculate F( 1 ) and F( 1 ) , put k = 1 and go to the main stage.

Step 1. If b k a k l, then the calculations end x * = (a k + b k )/ 2. Otherwise, if F( k ) > F( k ) then go to step 2; if F( k ) F( k ) , go to step 3.

Step 2. Put a k +1 = k , b k +1 = b k , k +1 = k , k +1 = a k +1 + (b k +1 a k +1 ). Calculate F( k +1 ), go to step 4.

Step 3. Put a k +1 = a k , b k +1 = k , k +1 = k , k +1 = a k +1 + (1 )(b k +1 a k +1 ). Calculate F( k +1 ).

Step 4. Assign k = k + 1, go to step 1.

At the first iteration, two calculations of the function are required; at all subsequent iterations, only one.

Conjugate Gradient Method (Fletcher-Reeves). In this method, the choice of the direction of movement on k+ 1 step takes into account the change in direction k step. The descent direction vector is a linear combination of the anti-gradient direction and the previous search direction. In this case, when minimizing ravine functions (with narrow long valleys), the search goes not perpendicular to the ravine, but along it, which allows you to quickly come to a minimum. The coordinates of a point when searching for an extremum using the conjugate gradient method are calculated by the expression X k +1 = X k V k +1 , where V k +1 Is a vector calculated by the following expression:

.

The first iteration usually relies on V = 0 and the antigradient search is performed as in the steepest descent method. Then, the direction of movement deviates from the direction of the antigradient the more, the more significantly the length of the gradient vector changes at the last iteration. After n steps to correct the operation of the algorithm take the usual step along the antigradient.

Conjugate Gradient Algorithm

Step 1. Enter starting point NS 0 , accuracy , dimension n.

Step 2. Put k = 1.

Step 3. Put vector V k = 0.

Step 4. Calculate grad F(X k ).

Step 5. Calculate vector V k +1.

Step 6. Perform one-dimensional vector search V k +1.

Step 7. If k < n, put k = k + 1 and go to step 4, otherwise go to step 8.

Step 8. If the length of the vector V less than , end search, otherwise  go to step 2.

The conjugate direction method is one of the most effective in solving minimization problems. The method in conjunction with one-dimensional search is often practically used in CAD. However, it should be noted that it is sensitive to errors in the counting process.

Disadvantages of gradient methods

    In tasks with a large number it is difficult or impossible to obtain derivatives in the form of analytical functions.

    When calculating derivatives using difference schemes, the resulting error, especially in the vicinity of the extremum, limits the possibilities of such an approximation.

Lecture number 8

Gradient methods solving nonlinear programming problems. Penalty function methods. Applications of nonlinear programming to operations research problems.

Unlimited tasks. Generally speaking, any nonlinear problem can be solved by the gradient method. However, in this case only a local extremum is found. Therefore, it is more expedient to apply this method when solving convex programming problems in which any local extremum is simultaneously global (see Theorem 7.6).

We will consider the problem of maximizing a nonlinear differentiable function f(x). The essence of gradient maximum point search NS* very simple: you need to take an arbitrary point NS 0 and using the gradient calculated at this point, determine the direction in which f(NS) increases with the greatest speed (Fig. 7.4),

and then, taking a small step in the direction found, go to new point x i... Then define again best direction to go to the next point NS 2, etc. In fig. 7.4 the search path is a polyline NS 0 , x 1 , NS 2 ... Thus, it is necessary to construct a sequence of points NS 0 , x 1 , NS 2 ,...,x k, ... so that it converges to the maximum point NS*, i.e., for the points of the sequence, the conditions

Gradient methods, as a rule, allow obtaining an exact solution in an infinite number of steps and only in some cases in a finite number. In this regard, gradient methods are referred to as approximate solution methods.

Movement from a point x k to a new point x k + 1 carried out along a straight line passing through a point x k and having the equation

(7.29)

where λ k is a numerical parameter on which the step size depends. As soon as the value of the parameter in equation (7.29) is chosen: λ k = λ k 0, this is how the next point on the search polyline becomes determined.

Gradient methods differ from each other in the way of choosing the step size - the value λ k 0 of the parameter λ k. You can, for example, move from point to point with a constant step λ k = λ, that is, for any k

If it turns out that , then you should return to the point and decrease the value of the parameter, for example, to λ /2.

Sometimes the step size is taken proportional to the modulus of the gradient.

If an approximate solution is sought, then the search can be terminated based on the following considerations. After each series of a certain number of steps, the achieved values ​​of the objective function are compared f(x). If after the next series the change f(x) does not exceed some predetermined small number, the search stops and the reached value f(x) is considered as the desired approximate maximum, and the corresponding NS mistaken for NS*.



If the objective function f(x) is concave (convex), then a necessary and sufficient condition for the optimality of the point NS* is the equality to zero of the gradient of the function at this point.

A common variant of gradient search is called the steepest rise method. Its essence is as follows. After defining the gradient at the point x to movement along a straight line produced to point x k + 1, in which maximum value functions f(NS) in the direction of the gradient. Then, at this point, the gradient is again determined, and the movement is performed in a straight line in the direction of the new gradient to the point x k + 2, in which the maximum value in this direction is reached f(x). The movement continues until the point is reached NS* corresponding to the highest value of the objective function f(x). In fig. 7.5 shows the scheme of movement to the optimal point NS* by the method of the fastest rise. In this case, the direction of the gradient at the point x k is tangent to the surface level line f(NS) at the point x k + 1, therefore, the gradient at the point x k + 1 is orthogonal to the gradient (compare with Fig. 7.4).

Moving from a point x k to a point is accompanied by an increase in the function f(x) by the amount

From expression (7.30) it can be seen that the increment is a function of the variable, i.e. When finding the maximum of the function f(x) in the direction of the gradient), it is necessary to choose the displacement step (multiplier) that provides the greatest increase in the increment of the function, namely the function. The value at which the maximum value is reached can be determined from the necessary condition for the extremum of the function:

(7.31)

Let us find an expression for the derivative, differentiating equality (7.30) with respect to as a complex function:

Substituting this result into equality (7.31), we obtain

This equality has a simple geometric interpretation: the gradient at the next point x k + 1, orthogonal to the gradient at the previous point x to.


the level lines of this surface are constructed. For this purpose, the equation is reduced to the form ( x 1 -1) 2 + (x 2 -2) 2 = 5-0.5 f, from which it is clear that the lines of intersection of the paraboloid with planes parallel to the plane x 1 About x 2 (level lines) are circles with a radius. At f= -150, -100, -50 their radii are equal respectively , and the common center is at the point (1; 2). Find the gradient of the given function:

I step... We calculate:

In fig. 7.6 starting at point NS 0 = (5; 10) vector 1/16 is constructed, indicating the direction of the steepest increase of the function at the point NS 0. The next point is located in this direction. At this point.

Using condition (7.32), we obtain

or 1-4 = 0, whence = 1/4. Since, then the found value is the maximum point. We find x 1 =(5-16/4; 10-32/4)=(1; 2).

II step... Starting point for the second step x 1 = (1; 2). We calculate = (- 4 ∙ 1 +4; -4 ∙ 2 + 8) = (0; 0). Hence, NS 1 = (1; 2) is a stationary point. But since this function is concave, a global maximum is reached at the found point (1; 2).

Problem with linear constraints. Note right away that if the objective function f(NS) in the problem with constraints has a single extremum and it is inside the admissible region, then to search for an extremum NS* the above methodology is applied without any changes.

Consider a convex programming problem with linear constraints:

(7.34)

It is assumed that f(NS) is a concave function and has continuous partial derivatives at each point of the feasible region.

Let's start with a geometric illustration of the problem solving process (Figure 7.7). Let the starting point NS 0 is located inside the valid area. From point NS 0 you can move in the direction of the gradient as long as f(x) will not reach its maximum. In our case f(x) increases all the time, so you need to stop at the point NS, on the boundary line. As can be seen from the figure, it is impossible to move further in the direction of the gradient, since we will leave the admissible area. Therefore, it is necessary to find another direction of movement, which, on the one hand, does not lead out of the permissible region, and on the other hand, provides the greatest increase f(x). This direction will determine the vector, which with the vector is the smallest sharp corner compared to any other vector going out of the point x i and lying in the admissible region. Analytically, such a vector can be found from the condition for maximizing the scalar product ... In this case, the vector indicating the most advantageous direction coincides with the boundary line.


Thus, at the next step, it is necessary to move along the boundary line until f(x); in our case - to the point NS 2. It can be seen from the figure that further one should move in the direction of the vector, which is found from the condition of maximizing the scalar product , i.e., along the boundary line. The movement ends at a point NS 3, since at this point the optimization search ends, because at this point the function f(NS) has a local maximum. Due to the concavity at this point f(NS) also reaches a global maximum in the admissible region. Gradient at the maximum point NS 3 =NS* is obtuse angle with any vector from the feasible region passing through x 3, so the dot product will be negative for any valid r k, except r 3 directed along the boundary line. For it, the dot product = 0, since both are mutually perpendicular (the boundary line touches the level line of the surface f(NS) passing through the maximum point NS*). This equality serves as an analytical sign that at the point NS Function 3 f(x) has reached its maximum.

Let us now consider an analytical solution to problem (7.33) - (7.35). If the optimization search starts from a point lying in the feasible region (all constraints of the problem are satisfied as strict inequalities), then one should move in the direction of the gradient as stated above. However, now the choice λ k in equation (7.29) is complicated by the requirement that the next point remains in the admissible region. This means that its coordinates must satisfy constraints (7.34), (7.35), i.e., the following inequalities must be satisfied:

(7.36)

Solving the system of linear inequalities (7.36), we find the segment of admissible values ​​of the parameter λ k at which point х k +1 will belong to the admissible region.

Meaning λ k * determined by solving equation (7.32):

Where f(x) has a local maximum in λ k in the direction, must belong to the line segment. If the found value λ k goes beyond the specified segment, then as λ k * is accepted. In this case, the next point of the search trajectory turns out to be on the boundary hyperplane corresponding to the inequality of system (7.36), according to which the right endpoint was obtained when solving the system. the range of admissible values ​​of the parameter λ k.

If the optimization search is started from a point lying on the boundary hyperplane, or the next point of the search trajectory turned out to be on the boundary hyperplane, then in order to continue moving to the maximum point, first of all, it is necessary to find the best direction of movement.To this end, an auxiliary problem of mathematical programming should be solved, namely, to maximize function

with restrictions

for those t at which

where .

As a result of solving problem (7.37) - (7.40), a vector will be found that makes up the smallest acute angle with the gradient.

Condition (7.39) says that the point belongs to the boundary of the admissible region, and condition (7.38) means that the movement from along the vector will be directed inside the admissible region or along its boundary. Normalization condition (7.40) is necessary to restrict the value, since otherwise the value of the objective function (7.37) can be made arbitrarily large Known various forms normalization conditions, and depending on this problem (7.37) - (7.40) can be linear or nonlinear.

After determining the direction, the value is found λ k * for the next point search trajectory. This uses necessary condition extremum in a form similar to equation (7.32), but replaced by a vector, i.e.

(7.41)

The optimization search stops when the point is reached x k *, wherein .

Example 7.5. Maximize function under constraints

Solution. For a clear presentation of the optimization process, we will accompany it with a graphical illustration. Figure 7.8 shows several lines of the level of this surface and the permissible area OABS, in which the point should be found NS*, which maximizes this function (see example 7-4).

Let's start an optimization search, for example, from the point NS 0 = (4, 2.5) lying on the boundary line AB x 1 +4x 2 = 14. Wherein f(NS 0)=4,55.

Find the value of the gradient

at the point x 0. In addition, it can be seen from the figure that level lines with marks higher than f(x 0) = 4.55. In a word, you need to look for a direction r 0 =(r 01 , r 02) move to next point x 1 is closer to optimal. For this purpose, we solve problem (7.37) - (7.40) of maximizing the function under the constraints


Since the point NS 0 is located only on one (first) boundary line ( i=1) x 1 +4x 2 = 14, then condition (7.38) is written in the form of equality.

The system of restrictive equations of this problem has only two solutions (-0.9700; 0.2425) and (0.9700; -0.2425) By directly substituting them into the function T 0 we set that the maximum T 0 is nonzero and is reached when solving (-0.9700; 0.2425) Thus, move from NS 0 is needed in the direction of the vector r 0 = (0.9700; 0.2425), that is, along the boundary line VA.

To determine the coordinates of the next point x 1 =(x 11 ; x 12)

(7.42)

it is necessary to find the value of the parameter at which the function f(x) at the point x

whence = 2.0618. Moreover, = -0.3999<0. Значит,=2,0618. По формуле (7.42) находим координаты новой точки х 1 (2; 3).

If we continue the optimization search, then when solving the next auxiliary problem (7.37) - (7.40) it will be established that T 1 = , and this suggests that the point x 1 is the maximum point x * of the objective function in the admissible area. The same can be seen from the figure at point x 1, one of the level lines touches the border of the permissible area. Therefore, the point x 1 is the maximum point x *. Wherein f max = f(x*)=5,4.


Problem with nonlinear constraints. If, in problems with linear constraints, motion along boundary straight lines turns out to be possible and even expedient, then under nonlinear constraints defining a convex region, any arbitrarily small displacement from the boundary point can immediately lead out of the range of admissible solutions, and there will be a need to return to the admissible region (fig. 7.9). A similar situation is typical for problems in which the extremum of the function f(x) is attained at the boundary of the region. In this regard, various

methods of movement, ensuring the construction of a sequence of points located near the border and inside the permissible area, or zigzag movement along the border with the intersection of the latter. As can be seen from the figure, the return from point x 1 to the permissible region should be carried out along the gradient of the boundary function that was violated. This will ensure the deviation of the next point x 2 towards the extreme point x *. The sign of an extremum in such a case will be the collinearity of the vectors and.

Lecture 6.

Gradient methods for solving nonlinear programming problems.

Questions: 1. General characteristics of the methods.

2. Gradient method.

3. Method of the steepest descent.

4. Frank-Fulf method.

5. Method of penalty functions.

1. General characteristics of the methods.

Gradient methods are approximate (iterative) methods for solving a nonlinear programming problem and allow you to solve almost any problem. However, in this case, a local extremum is determined. Therefore, it is advisable to apply these methods to solve convex programming problems in which each local extremum is also global. The process of solving the problem is that, starting from some point x (initial), a sequential transition is made in the direction gradF (x), if the maximum point is determined, and –gradF (x) (antigradient), if the minimum point is determined, up to the point , which is a solution to the problem. In this case, this point can be both inside the range of permissible values, and on its border.

Gradient methods can be divided into two classes (groups). The first group includes methods in which all points of interest belong to the admissible region. These methods include: the method of gradient, steepest descent, Frank-Wolfe, etc. The second group includes methods in which the points under study may not belong to the admissible region. Common of these methods is the method of penalty functions. All methods of penalty functions differ from each other in the way the "penalty" is determined.

The basic concept used in all gradient methods is the concept of the gradient of a function, as the direction of the steepest increase in a function.

When determining the solution by gradient methods, the iterative process continues until:

Or grad F (x *) = 0, (exact solution);

where
- two consecutive points,
- a small number characterizing the accuracy of the solution.

2. Gradient method.

Imagine a person standing on the slope of a ravine, who needs to go down (to the bottom). The most natural, it seems, is the direction in the direction of the steepest descent, i.e. direction (-grad F (x)). The resulting strategy, called gradient method, is a sequence of steps, each of which contains two operations:

a) determining the direction of the steepest descent (ascent);

b) moving in the chosen direction by some step.

Choosing the right pitch is essential. The smaller the step, the more accurate the result, but more calculations. Various modifications of the gradient method consist in using different methods for determining the step. If at any step the value of F (x) has not decreased, this means that the minimum point has been "passed", in this case it is necessary to return to the previous point and decrease the step, for example, by half.

Solution scheme.

belonging to the valid area

3. Step selection h.

x (k + 1) = x (k)

"-" - if min.

5. Definition of F (x (k +1)) and:

If
, a solution has been found;

Comment. If grad F (x (k)) = 0, then the solution will be exact.

Example. F (x) = -6x 1 + 2x 1 2 - 2x 1 x 2 + 2x 2 2
min,

x 1 + x 2 2, x 1 0, x 2 0,= 0,1.

3. Method of the steepest descent.

Unlike the gradient method, in which the gradient is determined at each step, in the steepest descent method, the gradient is found at the starting point and the movement in the found direction is continued in equal steps until the value of the function decreases (increases). If at any step F (x) has increased (decreased), then the movement in this direction stops, the last step is removed completely or by half, and a new value of the gradient and a new direction are calculated.

Solution scheme.

1. Definition x 0 = (x 1, x 2, ..., x n),

belonging to the admissible area,

and F (x 0), k = 0.

2. Definition of grad F (x 0) or –gradF (x 0).

3. Step selection h.

4. Determination of the next point by the formula

x (k + 1) = x (k) h grad F (x (k)), "+" - if max,

"-" - if min.

5. Definition of F (x (k +1)) and:

If
, a solution has been found;

If not:

a) when searching for min: - if F (x (k +1))

If F (x (k +1))> F (x (k)) - go to item 2;

b) when searching for max: - if F (x (k +1))> F (x (k)) - go to item 4;

If F (x (k +1))

Notes: 1. If grad F (x (k)) = 0, then the solution will be exact.

2. The advantage of the steepest descent method is its simplicity and

reduction of calculations, since grad F (x) is not calculated at all points, which

important for large-scale problems.

3. The disadvantage is that the steps must be small to avoid

skip the optimum point.

Example. F (x) = 3x 1 - 0.2x 1 2 + x 2 - 0.2x 2 2
max,

x 1 + x 2 7, x 1 0,

x 1 + 2x 2 10, x 2 0.

4. Frank-Wolfe method.

The method is used to optimize a nonlinear objective function under linear constraints. In the vicinity of the point under study, the nonlinear objective function is replaced by a linear function, and the problem is reduced to the sequential solution of linear programming problems.

Solution scheme.

1. Definition x 0 = (x 1, x 2, ..., x n), belonging to the admissible region, and F (x 0), k = 0.

2. Definition of grad F (x (k)).

3. Build function

(min - "-"; max– "+").

4. Determination of max (min) f (x) under the original constraints. Let it be the point z (k).

5. Determination of the computation step x (k +1) = x (k) + (k) (z (k) –x (k)), where (k) - step, coefficient, 0 1. (k) is chosen so that the value of the function F (x) is max (min) at the point x (k +1). To do this, solve the equation
and choose the smallest (largest) of the roots, but 0 1.

6. Determine F (x (k +1)) and check the need for further calculations:

If
or grad F (x (k +1)) = 0, then the solution has been found;

If not, then go to step 2.

Example. F (x) = 4x 1 + 10x 2 –x 1 2 –x 2 2
max,

x 1 + x 2 4, x 1 0,

x 2 2, x 2 0.

5. Method of penalty functions.

Let it be necessary to find F (x 1, x 2, ..., x n)
max (min),

g i (x 1, x 2, ..., x n) b i, i =
, x j 0, j = .

The functions F and g i are convex or concave.

The idea of ​​the penalty function method is to find the optimal value of the new objective function Q (x) = F (x) + H (x), which is the sum of the original objective function and some function H (x) determined by the system of constraints and called the penalty function. Penalty functions are constructed in such a way as to ensure either a quick return to the admissible area, or the impossibility of exiting from it. The penalty function method reduces a conditional extremum problem to solving a sequence of unconditional extremum problems, which is simpler. There are many ways to construct a penalty function. Most often it looks like:

H (x) =
,

where

- some positive Const.

Note:

The less , the faster the solution is found, however, the accuracy decreases;

They start the solution small and increase them in subsequent steps.

Using the penalty function, one moves sequentially from one point to another until an acceptable solution is obtained.

Solution scheme.

1. Determination of the starting point x 0 = (x 1, x 2,…, x n), F (x 0) and k = 0.

2. Select the calculation step h.

3. Determine partial derivatives and .

4. Determine the coordinates of the next point by the formula:

x j (k +1)
.

5. If x (k +1) Valid area, check:

what if
- the solution is found, if not - go to item 2.

b) if grad F (x (k +1)) = 0, then an exact solution has been found.

If x (k +1) Valid area, set a new value and go to item 4.

Example. F (x) = - x 1 2 - x 2 2
max,

(x 1 -5) 2 + (x 2 -5) 2 8, x 1 0, x 2 0.

Let us consider the problem of unconstrained minimization of a differentiable function of several variables. Let the approximation to the minimum point be the value of the gradient at the point. It was already noted above that in a small neighborhood of the point the direction of the steepest decrease of the function is given by the antigradient.This property is essentially used in a number of minimization methods. In the gradient method considered below, the direction of descent from a point is directly chosen Thus, according to the gradient method

There are various ways to select a step, each of which specifies a specific variation of the gradient method.

1. Method of the steepest descent.

Consider a function of one scalar variable and choose as the value for which the equality

This method, proposed in 1845 by O. Cauchy, is now called the steepest descent method.

In fig. 10.5 shows a geometric illustration of this method for minimizing a function of two variables. From the starting point perpendicular to the level line in the direction, the descent is continued until the minimum value of the function along the ray is reached. At the point found, this ray touches the level line.Then, from the point, a descent is carried out in a direction perpendicular to the level line until the corresponding ray touches the level line passing through this point, and so on.

Note that at each iteration step, the choice of the step presupposes the solution of the one-dimensional minimization problem (10.23). Sometimes this operation can be performed analytically, for example, for a quadratic function.

We apply the steepest descent method to minimize the quadratic function

with symmetric positive definite matrix A.

According to formula (10.8), in this case Therefore, formula (10.22) looks like this here:

notice, that

This function is a quadratic function of the parameter a and reaches a minimum at such a value for which

Thus, as applied to the minimization of the quadratic

function (10.24), the steepest descent method is equivalent to the calculation by formula (10.25), where

Remark 1. Since the minimum point of function (10.24) coincides with the solution of the system, the steepest descent method (10.25), (10.26) can also be used as an iterative method for solving systems of linear algebraic equations with symmetric positive definite matrices.

Remark 2. Note that where is the Rayleigh ratio (see § 8.1).

Example 10.1. We apply the steepest descent method to minimize the quadratic function

Note that Therefore, we know in advance the exact value of the minimum point. We write this function in the form (10.24), where the matrix and the vector.As it is easy to see,

Let's take an initial approximation and carry out calculations by formulas (10.25), (10.26).

I iteration.

II iteration.

It can be shown that for all the iteration values ​​will be obtained

Note that when Thus,

the sequence obtained by the steepest descent method converges at the speed of a geometric progression, the denominator of which is

In fig. 10.5 shows exactly the descent trajectory that was obtained in this example.

For the case of minimizing a quadratic function, the following general result is valid.

Theorem 10.1. Let A be a symmetric positive definite matrix and the quadratic function (10.24) be minimized. Then, for any choice of the initial approximation, the steepest descent method (10.25), (10.26) converges and the following error estimate is true:

Here and Lado are the minimum and maximum eigenvalues ​​of the matrix A.

Note that this method converges with the speed of a geometric progression, the denominator of which, moreover, if they are close, then it is small and the method converges quickly enough. For example, in Example 10.1 we have, and therefore, If Aschakh, then 1 and one should expect a slow convergence of the steepest descent method.

Example 10.2. Application of the steepest descent method to minimize the quadratic function at the initial approximation gives a sequence of approximations where the descent trajectory is shown in Fig. 10.6.

The sequence here converges at the speed of a geometric progression, the denominator of which is, i.e., much slower,

than in the previous example. Since here and the result obtained is in full agreement with the estimate (10.27).

Remark 1. We have formulated a theorem on the convergence of the steepest descent method in the case when the objective function is quadratic. In the general case, if the function to be minimized is strictly convex and has a minimum point x, then also, regardless of the choice of the initial approximation, the sequence obtained by the indicated method converges to x at. In this case, after falling into a sufficiently small neighborhood of the minimum point, the convergence becomes linear and the denominator of the corresponding geometric progression is estimated from above by the value and where are the minimum and maximum eigenvalues ​​of the Hessian matrix

Remark 2. For the quadratic objective function (10.24), the solution to the one-dimensional minimization problem (10.23) can be found in the form of a simple explicit formula (10.26). However, for most other nonlinear functions this cannot be done, and for the calculation by the steepest descent method, one has to apply numerical methods of one-dimensional minimization of the type that were considered in the previous chapter.

2. The problem of "ravines".

It follows from the above discussion that the gradient method converges quickly enough if, for the function to be minimized, the level surfaces are close to spheres (when the level lines are close to circles). For such functions and 1. Theorem 10.1, Remark 1, as well as the result of Example 10.2 indicate that the rate of convergence drops sharply with increasing quantity Indeed, it is known that the gradient method converges very slowly if the level surfaces of the function to be minimized are strongly elongated in some directions ... In the two-dimensional case, the topography of the corresponding surface resembles the topography of the area with a ravine (Figure 10.7). Therefore, such functions are usually called ravine functions. Along the directions characterizing the "ravine bottom", the ravine function changes insignificantly, while in other directions characterizing the "ravine slope" there is a sharp change in function.

If the starting point falls on the "ravine slope", then the direction of the gradient descent turns out to be almost perpendicular to the "ravine bottom" and the next approach falls on the opposite "ravine slope". The next step towards the "ravine bottom" returns the approach to the original "ravine slope". As a result, instead of moving along the "ravine bottom" towards the minimum point, the descent trajectory makes zigzag jumps across the "ravine", almost not approaching the target (Fig. 10.7).

To accelerate the convergence of the gradient method while minimizing gully functions, a number of special "gully" methods have been developed. Let's give an idea of ​​one of the simplest techniques. From two close starting points, a gradient descent to the "ravine bottom" is made. A straight line is drawn through the found points, along which a large "ravine" step is made (Fig. 10.8). From the point found in this way again make one step of the gradient descent to the point. Then make the second "ravine" step along the straight line passing through the points. As a result, movement along the "ravine bottom" to the minimum point is significantly accelerated.

More information about the problem of "ravines" and "ravine" methods can be found, for example, in,.

3. Other approaches to determining the descent step.

As it is easy to understand, at each iteration it would be desirable to choose the direction of descent close to the direction along which movement leads from point to point x. Unfortunately, the antigradient (is, as a rule, an unsuccessful direction of descent. This is especially pronounced for ravine functions. Therefore, there is doubt about the advisability of a thorough search for a solution to the one-dimensional minimization problem (10.23) and there is a desire to take in the direction only such a step that would provide " a significant decrease in the "function. Moreover, in practice, sometimes one is content with defining a value that simply ensures a decrease in the value of the objective function.

As we have already noted, the optimization problem is the problem of finding such factor values NS 1 = NS 1* , NS 2 = NS 2* , …, NSk = NSk * at which the response function ( at) reaches an extreme value at= ext (optimum).

There are various methods for solving the optimization problem. One of the most widely used is the gradient method, also called the Box-Wilson method and the steep ascent method.

Let's consider the essence of the gradient method using the example of a two-factor response function y =f (x 1 , NS 2 ). In fig. 4.3 curves of equal values ​​of the response function (level curves) are shown in the factor space. Point with coordinates NS 1 *, NS 2 * corresponds to the extreme value of the response function at ext.

If we choose any point of the factor space as the initial ( NS 1 0 , NS 2 0), then the shortest path to the vertex of the response function from this point is the path along the curve, the tangent to which at each point coincides with the normal to the level curve, i.e. this is the path in the direction of the gradient of the response function.

Gradient of continuous single-valued function y =f(x 1 , NS 2) is a vector defined in the direction by a gradient with coordinates:

where i,j- unit vectors in the direction of the coordinate axes NS 1 and NS 2. Partial derivatives characterize the direction of the vector.

Since we do not know the type of addiction y =f(x 1 , NS 2), we cannot find the partial derivatives, and determine the true direction of the gradient.

According to the gradient method, a starting point (initial levels) is selected in some part of the factor space NS 1 0 , NS twenty . A symmetric two-level design of the experiment is constructed with respect to these initial levels. Moreover, the variation interval is chosen so small that the linear model is adequate. It is known that any curve in a sufficiently small area can be approximated by a linear model.

After constructing a symmetric two-level plan, the interpolation problem is solved, i.e. a linear model is built:

and its adequacy is checked.

If for the selected interval of variation the linear model turned out to be adequate, then the direction of the gradient can be determined:

Thus, the direction of the gradient of the response function is determined by the values ​​of the regression coefficients. This means that we will move in the direction of the gradient if from a point with coordinates ( ) go to a point with coordinates:

where m - a positive number that specifies the step size in the direction of the gradient.

Insofar as NS 1 0 = 0 and NS 2 0 = 0, then .

Having determined the direction of the gradient () and choosing the step size m, we carry out the experience at the initial level NS 1 0 , NS 2 0 .


Then we take a step in the direction of the gradient, i.e. we carry out an experiment at a point with coordinates. If the value of the response function has increased in comparison with its value at the initial level, we take another step in the direction of the gradient, i.e. we carry out an experiment at a point with coordinates:

We continue to move along the gradient until the response function begins to decrease. In fig. 4.3 movement along the gradient corresponds to a straight line outgoing from the point ( NS 1 0 , NS twenty). It gradually deviates from the true direction of the gradient shown by the dashed line due to the non-linearity of the response function.

As soon as in the next experiment the value of the response function has decreased, the movement along the gradient is stopped, the experiment with the maximum value of the response function is taken as a new initial level, a new symmetric two-level plan is drawn up, and the interpolation problem is solved again.

By constructing a new linear model , carry out regression analysis. If, at the same time, checking the significance of the factors shows that at least one coefficient

This means that the region of the extremum of the response function (the region of the optimum) has not yet been reached. A new direction of the gradient is determined and movement towards the optimum region begins.

Refinement of the direction of the gradient and movement along the gradient continue until, in the process of solving the next interpolation problem, checking the significance of the factors shows that all factors are insignificant, i.e. all . This means that the optimum region has been reached. At this point, the solution to the optimization problem is stopped, and the experience with the maximum value of the response function is taken as the optimum.

In general, the sequence of actions required to solve the optimization problem by the gradient method can be presented in the form of a block diagram (Fig. 4.4).

1) initial levels of factors ( NSj 0) should be chosen as close as possible to the optimum point if there is some a priori information about its position;

2) intervals of variation (Δ NSj) should be chosen such that the linear model is likely to be adequate. The bottom boundary Δ NSj this is the minimum value of the variation interval at which the response function remains significant;

3) step value ( T) when moving along the gradient is chosen in such a way that the largest of the products does not exceed the difference between the upper and lower levels of the factors in the normalized form

.

Hence, . With a smaller value T the difference between the response function at the initial level and at the point with coordinates may turn out to be insignificant. With a larger value of the step, there is a danger of overshooting the optimum of the response function.