Review of gradient methods in mathematical optimization tasks. Gradient methods of unconditional optimization

There are no restrictions in the problem of unconditional optimization.

Recall that a gradient of a multidimensional function is called a vector that is analyzically expressed by the geometric sum of private derivatives

Gradient scalar function F.(X.) At some point, it is directed towards the apparent increase of the function and orthogonal level line (the surface of the constant value F.(X.), passing through the point X. k.). The vector opposite gradient  antigigradite  is aimed toward the devices of the function F.(X.). At the point of Extremum grad. F.(X.)= 0.

In gradient methods, the movement of the point when searching for a minimum of the target function is described by the iterative formula

where k.  Step parameter on k.Iterations along the anti-agadient. For the climbing methods (maximum search), you need to move on a gradient.

Various variants of gradient methods differ from each other by the way to select a step parameter, as well as accounting for the direction of movement in the previous step. Consider the following options Gradient methods: with a constant step, with a variable step parameter (fraction of step), the method of the formal descent and the method of conjugate gradients.

Method with a constant step parameter. In this method, the step parameter is constant on each iteration. The question arises: how to practically choose the value of the step parameter? A sufficiently small step parameter can lead to the unacceptably large number of iterations needed to achieve a minimum point. On the other hand, too much step parameter can lead to an approximation of a minimum point and to an oscillatory computing process near this point. These circumstances are disadvantages of the method. Since it is impossible to guess the acceptable value of the step parameter in advance. k. , There is a need to use a gradient method with variable step parameter.

As you approach the optimum, the gradient vector decreases in magnitude, striving for zero, so k. \u003d Const Step length gradually decreases. Near Optimum, the length of the gradient vector strives for zero. Vector length or norm in n.-Herous Euclidean space is determined by the formula

where n.  number of variables.

Optimum search process stopping options:


From a practical point of view, it is more convenient to use the 3rd stop criterion (since the value of the design parameters value is of interest), however, to determine the proximity of the extremma point, you need to focus on the 2nd criterion. To stop the computing process, you can use several criteria.

Consider an example. Find a minimum target function F.(X.) = (x. 1  2) 2 + (x. 2  4) 2 . Accurate solution of the task X * \u003d (2.0; 4.0). Expressions for private derivatives

,
.

Choose Step k. = 0.1. Perform a search from the starting point X. 1 = . Decision will be submitted as a table.

Gradient method with crushing step parameter.In this case, in the process of optimization, the step parameter  K decreases if, after another step, the target function increases (when searching for a minimum). In this case, the length of the step is crushed (divided) in half, and the step is repeated from the previous point. This is how a more accurate approach is provided to the extremum point.

Method of the formal descent. Methods with variable steps are more economical in terms of the number of iterations. If optimal length Step  K along the direction of the anti-adender is a solution to a one-dimensional minimization problem, then this method is called the method of ancient descent. In this method, the task of one-dimensional minimization is solved on each iteration:

F (X. k + 1. ) \u003d F (x k. k. S. k. ) \u003d min f ( k. ), S. k. = F (x);

k. >0

.

IN this method The movement in the direction of the anti-porter continues until the minimum of the target function is reached (while the value of the target function decreases). On the example, we consider how the target function can analytically be recorded on each step depending on the unknown parameter.

Example. mIN. F.(x. 1 , x. 2 ) = 2x. 1 2 + 4x. 2 3 3. Then F.(X.)= [ 4x. 1 ; 12x. 2 2 ]. Let the point X. k. = , hence F.(X.)= [ 8; 12], F.(X. k. S. k. ) =

2(2  8) 2 + 4(1  12) 3  3. It is necessary to find  that gives the minimum of this function.

Algorithm of the method of the formal descent (to search for a minimum)

Primary step. Let   stop constant. Select the starting point X. 1 , put k. = 1 and go to the main step.

The main step. If a || gradf.(X.)||< , then finish the search, otherwise determine F.(X. k. ) and find k. optimal solution Minimization tasks F.(X. k. k. S. k. ) for k. 0. Put X. k. +1 = X. k. k. S. k. Assign k. =

k. + 1 And repeat the main step.

To search for a minimum of the function of one variable in the Method of Emidimization methods, you can use the methods of unimodal optimization. From a large group of methods, we consider the method of dichotomy (bisection) and the golden section. The essence of the methods of umnamed optimization is a narrowing of the interval of the uncertainty of extremum.

Dichotomy method (bisection)Start. Select the constant of distinguishability  and the final length of the uncertainty interval l.. The value  should be less possibly, however, it allows you to distinguish the values \u200b\u200bof the function F.() and F.() . Let be [ a. 1 , b. 1 ]  initial uncertainty interval. Put k. =

The main stage consists of a finite number of the same type of iterations.

k-I iteration.

Step 1. If a b. k. a. k. l. , Calculations end. Decision x. * = (a. k. + b. k. )/2. Otherwise

,
.

Step 2. If a F.( k. ) < F.( k. ), put a. k. +1 = a. k. ; b. k. +1 = k. . Otherwise a. k. +1 = k. and b. k. +1 = b. k. . Assign k. = k. + 1 and go to step 1.

Golden section method.More effective methodthan dichotomy method. Allows you to get specified quantity Interval of uncertainty for a smaller number of iterations and requires a smaller number of calculations of the target function. In this method, the new division of the uncertainty interval is calculated once. The new point is set at a distance

 \u003d 0.618034 from the end of the interval.

Algorithm of the method of the Golden Section

Start. Select the permissible finite length of the uncertainty interval l. > 0. Let be [ a. 1 , b. 1 ]  initial uncertainty interval. Put 1 = a. 1 +(1 )(b. 1 a. 1 ) and 1 = a. 1 + (b. 1 a. 1 ) where = 0,618 . Calculate F.( 1 ) and F.( 1 ) , put k. = 1 and go to the main stage.

Step 1. If a b. k. a. k. l. then calculations end x. * = (a. k. + b. k. )/ 2. otherwise if F.( k. ) > F.( k. ) , then go to step 2; if a F.( k. ) F.( k. ) , go to step 3.

Step 2. Put a. k. +1 = k. , b. k. +1 = b. k. , k. +1 = k. , k. +1 = a. k. +1 + (b. k. +1 a. k. +1 ). Calculate F.( k. +1 ), Go to step 4.

Step 3. Put a. k. +1 = a. k. , b. k. +1 = k. , k. +1 = k. , k. +1 = a. k. +1 + (1 )(b. k. +1 a. k. +1 ). Calculate F.( k. +1 ).

Step 4. Assign k. = k. + 1, go to step 1.

On the first iteration you need two calculations of the function, on all subsequent only one.

The method of conjugate gradients (Fletcher Rivis).In this method, the choice of direction of movement on k.+ 1 step takes into account the change in the direction on k. step. The descent direction vector is a linear combination of the direction of the anti-profit and the previous search direction. In this case, when minimizing the ambition functions (with narrow long depressions), the search is not perpendicular to the ravine, but along it, which allows you to quickly come to a minimum. The coordinates of the point when searching for an extremum by the method of conjugate gradients are calculated by expression X. k. +1 = X. k. V. k. +1 where V. k. +1 - Vector calculated by the following expression:

.

On the first iteration usually relies V. = 0 And the search for antigigradite is performed, as in the method of the preconceitated descent. Then the direction of movement deviates from the direction of the anti-industry. The greater the greater the length of the gradient vector changes in the last iteration. After n. Steps To correct the work of the algorithm make the usual step by antigigrad.

Algorithm of the method of conjugate gradients

Step 1. Enter the starting point H. 0 accuracy , dimension n..

Step 2. Put k. = 1.

Step 3. Put vector V. k. = 0.

Step 4. Calculate grad. F.(X. k. ).

Step 5. Calculate vector V. k. +1.

Step 6. Perform a one-dimensional search by the vector V. k. +1.

Step 7. If a k. < n., put k. = k. + 1 and go to step 4, otherwise stepping 8.

Step 8. If the length of the vector V. Less , finish the search, otherwise go to step 2.

The method of conjugate directions is one of the most effective in solving minimization problems. The method in aggregate with one-dimensional search is often practically used in CAD. However, it should be noted that it is sensitive to errors that occur during the account process.

Disadvantages of gradient methods

    In tasks S. large number Variables are hard or impossible to obtain derivatives in the form of analytical functions.

    When calculating derivatives in difference schemes, the error occurs, especially in the vicinity of extremum, limits the possibilities of such approximation.

Lecture number 8.

Gradient methods Solving problems of nonlinear programming. Penalty functions. Nonlinear programming applications for operations research tasks.

Tasks without restrictions. The gradient method can be solved, generally speaking, any nonlinear task. However, only local extremum is located. Therefore, it is more expedient to apply this method when solving problems of convex programming, in which any local extremum is both global (see theorem 7.6).

We will consider the task of maximizing the nonlinear differentiable function f.(x.). The essence of the gradient search point of the maximum h.* Extremely simple: you need to take an arbitrary point h. 0 and with the help of a gradient calculated at this point, determine the direction in which f.(h.) increases with the highest speed (Fig. 7.4),

and then, making a small step in the destination found, go to new point x I.. Then determine later the best direction To go to another point h. 2, etc. In fig. 7.4 Search trajectory is a broken h. 0 , x. 1 , h. 2 ... Thus, it is necessary to build a sequence of points h. 0 , x. 1 , h. 2 ,...,x. k, ... so that it converges to the maximum point h.*, i.e., conditions were performed for sequence points

Gradient methods, as a rule, make it possible to obtain an exact solution for the infinite number of steps and only in some cases - for the final one. In this regard, gradient methods refer to approximate decision methods.

Movement from a point x K. in a new point x K + 1 carried out in a straight line passing through the point x K. and having equation

(7.29)

where λ k is a numeric parameter on which the step value depends. Once the value of the parameter in equation (7.29) is selected: λ k \u003d λ k 0, the next point on the search broken is determined.

Gradient methods differ from each other by the method of selecting the value of the step - the values \u200b\u200bof λ k 0 parameter λ k. You can, for example, move from point to a point with a constant step λ k \u003d λ, i.e. for any k.

If it turns out that then you should return to the point and reduce the value of the parameter, for example, λ /2.

Sometimes the step is taken by the proportional gradient module.

If the approximate solution is searched, the search can be stopped based on the following considerations. After each series of a certain number of steps compare the achieved values \u200b\u200bof the target function f.(x.). If after the next series change f.(x.) does not exceed some inadequimate small number, the search is stopped and the value achieved f.(x.) view as a desired approximate maximum, and corresponding to it h.accept for h.*.



If the target feature f.(x.) concave (convex), then necessary and sufficient condition optimality h.* is the equality zero gradient function at this point.

The variant of the gradient search, called the similar lift method, is common. The essence of it is as follows. After determining the gradient at the point x K. Movement along direct performed to the point x K +. 1 in which it is achieved maximum value Functions f.(h.) In the direction of the gradient. Then, at this point, the gradient is once again determined, and the movement is performed in a straight line in the direction of the new gradient to the point x K +. 2, in which the maximum value is achieved in this direction. f.(x.). The movement continues until the point is reached. h.* corresponding to the greatest value of the target function f.(x.). In fig. 7.5 shows a movement scheme to the optimal point. h.* by the method of the formal lift. In this case, the direction of the gradient at the point x K.is a tangent of the surface level line f.(h.) At point x K +. 1, therefore, gradient at point x K +. 1 orthogonal gradient (Compare Fig. 7.4).

Move from a point x K. to the point is accompanied by an increase in the function f.(x.) By magnitude

From the expression (7.30) it can be seen that increment is a function of the variable, i.e.. When finding a maximum function f.(x) In the direction of the gradient), it is necessary to choose the movement step (multiplier), which ensures the greatest increase in the increment of the function, it is functions. The value at which the greatest value is achieved may be determined from the required extremma function of the function:

(7.31)

We will find an expression for a derivative, differentiating equality (7.30) according to as a complex function:

Substituting this result in equality (7.31), we get

This equality has a simple geometric interpretation: a gradient at the next point x K +. 1, orthogonal gradient at the previous point x K..


Built lines of the level of this surface. To this end, the equation is shown to mind ( x. 1 -1) 2 + (x 2 -2) 2 \u003d 5-0.5 f.from which it is clear that the lines of crossing a paraboloid with planes parallel to the plane x. 1 O. x. 2 (level lines), are circumference with radius. For f.\u003d -150, -100, -50 their radii are equal, respectively , and the general center is at point (1; 2). We find the gradient of this function:

I step. Calculate:

In fig. 7.6 with the beginning at the point h. 0 \u003d (5; 10) Vector 1/16, indicating the direction of the definition of the function at the point h. 0. This direction is the next point. At this point.

Using Condition (7.32), we get

or 1-4 \u003d 0, where \u003d 1/4. Since, the value found is the maximum point. Find x. 1 =(5-16/4; 10-32/4)=(1; 2).

STEP. Start Point for the Second Step x. 1 \u003d (1; 2). Calculate \u003d (- 4 ∙ 1 +4; -4 ∙ 2 + 8) \u003d (0; 0). Hence, h. 1 \u003d (1; 2) is a stationary point. But since this function is concave, then a global maximum is achieved at the found point (1; 2).

Task with linear restrictions. Immediately note that if the target function f.(h.) The problem with restrictions has a single extremum and it is inside the permissible area, then to search for an extreme point h.* Applies the above methodology without any changes.

Consider the task of convex programming with linear restrictions:

(7.34)

It is assumed that f.(h.) It is a concave function and has continuous private derivatives at each point of a permissible area.

Let's start with the geometric illustration of the problem of solving the problem (Fig. 7.7). Let the initial point h. 0 is located inside the permissible area. From the point h. 0 can move in the direction of the gradient while f.(x.) It will not reach the maximum. In our case f.(x.) all the time increases, so you need to stop at the point h., on the boundary direct. As can be seen from the drawing, it is impossible to move further in the direction of the gradient, since we will come out of the permissible area. Therefore, it is necessary to find another direction of movement, which, on the one hand, does not output from the permissible region, and on the other - it provides the greatest increase f.(x.). This direction will define the vector that makes the smallest vector with the vector sharp corner Compared with any other vector output from the point x I. and lying in the permissible area. Analytically, such a vector is found from the conditions for maximizing the scalar product. . In this case, the vector indicating the highest direction coincides with the boundary direct.


Thus, in the next step, you need to move on the boundary direct until it increases f.(x.); In our case - to the point h. 2. From the figure it is clear that further should be moved in the direction of the vector, which is from the conditions for maximizing the scalar product , i.e. on the boundary direct. Movement ends at the point h. 3, since the optimization search is completed at this point, for it is function f.(h.) It has a local maximum. In view of the concavity at this point f.(h.) also reaches a global maximum in the permissible area. Gradient at maximum point h. 3 =h.* is obtuse angle With any vector from a valid area passing through x 3, so the scalar product will be negative for any permissible r K., Besides r. 3, directed over the boundary direct. For him, a scalar product \u003d 0, since mutually perpendicular (boundary direct concerns the surface level line f.(h.) passing through the maximum point h.*). This is equality and serves as an analytic sign that at the point h. 3 Function f.(x.) Reached a maximum.

We now consider the analytical solution of the problem (7.33) - (7.35). If the optimization search begins with a point lying in the permissible area (all limitations of the problem are performed as strict inequalities), then it is necessary to move in the direction of the gradient as it is set above. However, now the choice λ K. In equation (7.29), it is complicated by the requirement that the next point remains in the permissible area. This means that its coordinates must satisfy the restrictions (7.34), (7.35), i.e. inequalities should be met:

(7.36)

Solving linear inequalities system (7.36), we find a segment of permissible parameter values λ K., in which point x K +1 will belong to the permissible area.

Value λ k *determined by the solution of equation (7.32):

In which f.(x.) has a local maximum λ K. In the direction, it should belong to the segment. If the value found λ K. goes beyond the specified segment, then as λ k * Accepted. In this case, the next point of the search trajectory is on the boundary hyperplane corresponding to the inequality of the system (7.36), according to which the right end point is obtained when solving the system. Segment of permissible parameter values λ K..

If the optimization search starts from the point lying on the boundary hyperplane, or the next point of the search trajectory was on the boundary hyperplane, then to continue the movement to the maximum point, it is primarily necessary to find the best direction of movement for this purpose, the auxiliary task of mathematical programming should be solved, namely, maximize Function

with restrictions

for those t.for which

where .

As a result of the solution of the problem (7.37) - (7.40), a vector will be found, which makes the smallest sharp angle with the gradient.

Condition (7.39) suggests that the point belongs to the boundary of the permissible region, and the condition (7.38) means that the movement from the vectum will be directed inside the permissible region or at its border. The condition of normalization (7.40) is necessary to limit the value, since otherwise the value of the target function (7.37) can be made by an arbitrarily known various forms The conditions of normalization, and depending on this problem (7.37) - (7.40) may be linear or nonlinear.

After determining the direction there is a value λ k * For the next point search trajectory. It is used prerequisite Extremum in a form similar to equation (7.32), but with a replacement to a vector, i.e.

(7.41)

The optimization search stops when the point is reached x K *, wherein .

Example 7.5. Maximize the function during restrictions

Decision. For a visual presentation of the optimization process, we will accompany it with a graphic illustration. Figure 7.8 shows several lines of the level of this surface and the permissible area of \u200b\u200bthe OAVAS in which the point should be found h.*, delivering the maximum of this function (see Example 7 4).

Let's start the optimization search, for example from a point h. 0 \u003d (4, 2.5) lying on the boundary straight line x. 1 +4x. 2 \u003d 14. Wherein f.(h. 0)=4,55.

Find the value of the gradient

at point x. 0. In addition, and in the figure it can be seen that through the permissible area there are lines of the level with marks higher than f.(x. 0) \u003d 4.55. In short, you need to look for the direction r. 0 =(r. 01 , r. 02) Move to the next point x. 1 closer to optimal. To this end, we solve the problem (7.37) - (7.40) maximizing the function during restrictions


Since point h. 0 is located only on one (first) boundary straight line ( i.=1) x. 1 +4x. 2 \u003d 14, then Condition (7.38) is recorded in the form of equality.

The system of restrictive equations of this problem has only two solutions (-0.9700; 0.2425) and (0.9700; -0.2425) direct substitution to the function T. 0 Install that the maximum T. 0 is different from zero and achieved when solving (-0.9700; 0.2425) Thus, move from h. 0 need in the direction of the vector r. 0 \u003d (0.9700; 0.2425), T e via boundary direct wa.

To determine the coordinates of the next point x. 1 =(x. 11 ; x. 12)

(7.42)

it is necessary to find the value of the parameter in which the function f.(x.) At point x.

where \u003d 2,0618. At the same time \u003d -0,3999<0. Значит,=2,0618. По формуле (7.42) находим координаты новой точки х 1 (2; 3).

If you continue the optimization search, then when solving the next auxiliary problem (7.37) - (7.40) it will be found that T 1 \u003d And this suggests that the point x 1 is the point of the maximum x * target function in the permissible area. It is visible from the figure at point x 1 one of the level lines concerns the boundary of the permissible area. Consequently, point x 1 is a point of maximum x *. Wherein f. Max \u003d. f.(x.*)=5,4.


Task with nonlinear restrictions. If in problems with linear constraints, the movement on the boundary direct is possible and even appropriate, then with nonlinear constraints that determine the convex area, any as an empty movement from the boundary point can immediately deduce the permissible solutions, and the need to return to the permissible region will arise (Fig. 7.9). Such a situation is characteristic of the tasks in which the extremum function f.(x.) It is achieved on the border of the region. In this regard, various

methods of movement that ensure the construction of the sequence of points located near the boundary and inside the permissible area, or the zigzag movement along the border with the intersection of the latter. As can be seen from the figure, the return from point x 1 to the permissible region should be carried out along the gradient of the boundary function, which was disturbed. This will deviate the next point x 2 towards the extremum point x *. The sign of extremum in this case will be the collinearity of vectors and.

Lecture 6.

Gradient methods for solving nonlinear programming problems.

Questions:1. General characteristics of methods.

2. Gradient method.

3. The method of the Great Desk.

4. Frank Fulf method.

5. Penalty functions.

1. General characteristics of methods.

Gradient methods are approximate (iterative) methods for solving the problem of nonlinear programming and allow you to solve almost any task. However, the local extremum is determined. Therefore, it is advisable to apply these methods to solve convex programming tasks in which each local extremum is both global. The process of solving the problem is that, starting at some point x (initial), a sequential transition is carried out in the GRADF (X) direction, if the maximum point is determined, and -gradf (x) (antigigadient), if the minimum point is defined to the point Deciding the problem. At the same time, this point may be within the area of \u200b\u200bpermissible values \u200b\u200band at its border.

Gradient methods can be divided into two classes (groups). The first group includes methods in which all the studied points belong to the permissible area. Such methods include: the gradient method, the core descent, Frank-Wulf, and others. The second group includes methods in which the studied points may not belong to the permissible area. Common of such methods is the method of penalty functions. All methods of penalty features differ from each other by the method of determining the "fine".

The main concept used in all gradient methods is the concept of a gradient of the function, as the directions of the definition of the function.

In determining the solution of gradient methods, the iterative process continues until:

Either GRAD F (x *) \u003d 0, (accurate solution);

where
- Two successive points,
- Small number characterizing the accuracy of the solution.

2. Gradient method.

Imagine a man standing on the slope of the ravine, which needs to go down (at the bottom). The most natural, it seems, the direction towards the greatest steepness of the descent, i.e. Direction (-Grad F (x)). The strategy received at the same time, called gradient methodis a sequence of steps, each of which contains two operations:

a) determining the direction of the greatest steepness of the descent (lifting);

b) Move in the selected direction for some step.

The right choice of step is essential. The step less, the more accurate the result, but more computing. Various modifications of the gradient method and consist in using various ways to determine the step. If at any step, the value f (x) did not decrease, it means that the point of the minimum "slipped", in which case it is necessary to return to the previous point and reduce the step, for example, twice.

Solution scheme.

belonging to the permissible region

3. Select step H.

x (k + 1) \u003d x (k)

"-" - if min.

5. Determination f (x (k +1)) and:

If a
Decision found;

Comment.If GRAD F (x (k)) \u003d 0, then the solution will be accurate.

Example.F (x) \u003d -6x 1 + 2x 1 2 - 2x 1 x 2 + 2x 2 2
min

x 1 + x 2 2, x 1 0, x 2 0,= 0,1.

3. The method of the Great Desk.

In contrast to the gradient method, in which the gradient is determined at each step, in the method of the Gradient, the gradient is found at the starting point and the movement in the found direction continues the same steps until the value of the function decreases (increases). If on any step f (x) increased (decreased), then the movement in this direction stops, the last step is removed completely or half and the new value of the gradient and the new direction is calculated.

Solution scheme.

1. Determination x 0 \u003d (x 1, x 2, ..., x n),

belonging to the permissible area

and f (x 0), k \u003d 0.

2. Determination GRAD F (x 0) or -Gradf (x 0).

3. Select step H.

4. Determining the next point by the formula

x (k + 1) \u003d x (k) h GRAD F (X (K)), "+" - if MAX,

"-" - if min.

5. Determination f (x (k +1)) and:

If a
Decision found;

If not:

a) when searching min: - if f (x (k +1))

If f (x (k +1))\u003e f (x (k)) - the transition to paragraph 2;

b) when searching max: - if (x (k +1))\u003e f (x (k)) - the transition to clause 4;

If f (x (k +1))

Remarks:1. If GRAD F (X (K)) \u003d 0, then the solution will be accurate.

2. The advantage of the method of the formal descent is its simplicity and

reduction of calculations, since Grad F (X) is not calculated at all points that

it is important for the tasks of the big dimension.

3. The disadvantage is that steps should be small to not

skip the optimum point.

Example.F (x) \u003d 3x 1 - 0.2X 1 2 + x 2 - 0.2X 2 2
mAX,

x 1 + x 2 7, x 1 0,

x 1 + 2x 2 10, x 2 0.

4. Method Frank-Wulf.

The method is used to optimize a nonlinear target function with linear constraints. In the neighborhood of the test point, a non-linear target function is replaced with a linear function and the task is reduced to a sequential solution of linear programming tasks.

Solution scheme.

1. Determination x 0 \u003d (x 1, x 2, ..., x n) belonging to the permissible region, and f (x 0), k \u003d 0.

2. Definition GRAD F (X (K)).

3. Build a function

(Min - "-"; Max- "+").

4. Definition of MAX (MIN) F (X) with source restrictions. Let it be point z (k).

5. Determining the step of calculations x (k +1) \u003d x (k) + (k) (z (k) -x (k)), where (k) - step, coefficient, 0 1. (k) It is selected so that the value of the function f (x) was MAX (MIN) at the point x (k +1). To do this, solve the equation
and choose the smallest (the largest) of the roots, but 0 1.

6. Determination f (x (k +1)) and check the need for further computing:

If a
or GRAD F (X (K +1)) \u003d 0, then the solution is found;

If not, then the transition to paragraph 2.

Example.F (x) \u003d 4x 1 + 10x 2 -X 1 2 -X 2 2
mAX,

x 1 + x 2 4, x 1 0,

x 2 2, x 2 0.

5. Penalty functions.

Let it be necessary to find f (x 1, x 2, ..., x n)
max (min),

g i (x 1, x 2, ..., x n) b i, i \u003d
, X J. 0, J \u003d .

Functions F and G I are convex or concave.

The idea of \u200b\u200ba method of penalty functions is to search for the optimal value of the new target function Q (x) \u003d f (x) + h (x), which is the sum of the initial target function and some function h (x), determined by the system of restrictions and called the penalty function. Penal functions build in such a way as to provide either a rapid return to the permissible area, or the impossibility of outputs from it. The method of penalty functions reduces the task for a conditional extremum to solve the problem of problems on the unconditional extremum, which is easier. There are many ways to build a penalty function. Most often it has the form:

H (x) \u003d
,

where

- Some positive const.

Note:

Than less The faster the solution is, however, the accuracy is reduced;

Start a solution from small and increase them in subsequent steps.

Using the penalty function, consistently move from one point to the other until the acceptable solution is received.

Solution scheme.

1. Definition of the starting point x 0 \u003d (x 1, x 2, ..., x n), f (x 0) and k \u003d 0.

2. Choose a step of calculations h.

3. Determine private derivatives and .

4. Determine the coordinates of the next point by the formula:

x j (k +1)
.

5. If X (K +1) A permissible region check:

what if
- The solution is found if not - the transition to paragraph 2.

b) If GRAD F (X (K +1)) \u003d 0, the exact solution was found.

If x (k +1) Permissible area set new meaning and go to clause 4.

Example.F (x) \u003d - x 1 2 - x 2 2
mAX,

(x 1 -5) 2 + (x 2 -5) 2 8, x 1 0, x 2 0.

Consider the problem of unconditional minimization of the differential function of many variables. Let the gradient value at the point above already noted that in the low neighborhood of the point, the direction of the definition of the function is defined by the antigrawn, this property is significantly used in a number of minimization methods. In the gradient method under consideration below, the direction of the descent from the point is directly selected in this way, according to the gradient method.

There are various ways to select a step each of which specifies a specific variant of the gradient method.

1. Method of the Great Desk.

Consider the function of one scalar variable and choose as the value for which equality is performed.

This method proposed in 1845. O. Cauchy is customary to be called the method of pre-reserved descent.

In fig. 10.5 depicts a geometric illustration of this method to minimize the function of two variables. From the starting point perpendicular to the level line in the direction, the descent continues until the value of the function is minimal along the ray. At the found point, this ray concerns the level line. Then, from the point, the descent is carried out in the perpendicular line of the direction of the direction until the corresponding beam is touched at the point passing through this point, etc.

Note that at each iteration the choice of step implies a solution to the task of one-dimensional minimization (10.23). Sometimes this operation can be performed analytically, for example for a quadratic function.

Apply the definition method for minimizing the quadratic function

with a symmetric positively defined matrix A.

According to formula (10.8), in this case, therefore, formula (10.22) looks like this:

notice, that

This feature is a quadratic function of the parameter A and reaches a minimum with such a value for which

Thus, in relation to the minimization of quadratic

functions (10.24) The method of the formula (10.25) is equivalent to the calculation according to formula (10.25), where

Remark 1. Since the point of a minimum of the function (10.24) coincides with the solution of the system, the method of the Great descent (10.25), (10.26) can be applied both as an iterative method for solving systems of linear algebraic equations with symmetric positively defined matrices.

Remark 2. Note that where the Rayleigh ratio (see § 8.1).

Example 10.1. Apply the definition method for minimizing the quadratic function

Note that therefore, the exact value of the point of the minimum is known in advance. We write this feature in form (10.24), where the matrix and the vector is not difficult to see,

Take the initial approximation and will calculate according to formulas (10.25), (10.26).

I iteration.

II iteration.

It can be shown that values \u200b\u200bwill be obtained for all iterations

Note that at this way

the sequence obtained by the method of pre-reserved descent, converges with the speed of geometric progression, which denominator

In fig. 10.5 shows exactly the same trajectory of descent, which was obtained in this example.

For the case of minimizing the quadratic function, the next overall result.

Theorem 10.1. Let a be symmetric positively defined matrix and minimizes the quadratic function (10.24). Then, with any choice, the initial approximation is the method of descent (10.25), (10.26) converges and the following estimate of the error is converged:

Here and Lado - the minimum and maximum eigenvalues \u200b\u200bof the matrix A.

It should be noted that this method converges at the speed of geometric progression, the denominator of which, and if they are close, then little and method converges quite quickly. For example, in Example 10.1, we have and therefore if agencies, then and 1 and should be expected to expect a slow convergence of the method of the formal descent.

Example 10.2. The use of the method of the pre-cerebral descent to minimize the quadratic function with the initial approximation gives a sequence of approximations where the trajectory of the descent is depicted in Fig. 10.6.

The sequence converges here at a speed of geometric progression, the denominator of which is equal to that. Essentially slower,

than in the previous example. Since here and the result obtained is quite consistent with the assessment (10.27).

Note 1. We formulated the theorem about the convergence of the method of the formulated descent in the case when the target function is quadratic. In general, if the minimized function is strictly convex and has a point of minimum x, then regardless of the selection of the initial approximation, the sequence obtained by the specified method converges to x at. In this case, after entering a sufficiently small neighborhood of a minimum point, the convergence becomes a linear and the denominator of the corresponding geometric progression is estimated from top of the value and where and the minimum and maximum eigenvalues \u200b\u200bof the hessse matrix

Remark 2. For a quadratic target function (10.24) The solution of the task of one-dimensional minimization (10.23) can be found as a simple express formula (10.26). However, for most other nonlinear functions, this cannot be done and for calculating the method of one-dimensional minimization of those that were considered in the previous chapter, it is necessary to apply numerical methods of one-dimensional minimization.

2. The problem of "ravines".

From the above discussion it follows that the gradient method converges quite quickly, if for a minimized function of the surface of the level is close to the spheres (with the level of the level close to the circles). For such functions and 1. Theorem 10.1, remark 1, as well as the result of Example 10.2 indicate that the rate of convergence drops sharply with an increase in the value indeed, it is known that the gradient method converges very slowly if the surface level of the minimized function is strongly elongated in some directions . In the two-dimensional case, the relief of the appropriate surface resembles a terrain with a ravine (Fig. 10.7). Therefore, such functions are called ambulance. Along the directions characterizing the "bottom of the ravine", the exhaust function varies slightly, and in other directions characterizing the "slope of the ravine", there is a sharp change in the function.

If the initial point hits the "slope of the ravine", the direction of the gradient descent turns out to be almost perpendicular to the "bottom of the ravine" and the next approximation falls on the opposite "slope of the ravine". The next step towards the "bottom of the ravine" returns the approach to the initial "slope of the ravine". As a result, instead of moving along the "Rasp bottom" in the direction to the point of the minimum, the trajectory of descent performs zigzag jumps across the "ravine", almost not approaching the target (Fig. 10.7).

To accelerate the convergence of the gradient method, with minimization of ambition functions, a number of special "ambition" methods have been developed. Let's give an idea of \u200b\u200bone of the simplest techniques. Of two close initial points, a gradient descent on the "bottom of the ravine" is performed. Through the found points spend a straight line, along which a large "rejuvenated" step is made (Fig. 10.8). From this thus, the points again make one step of the gradient descent to the point then they make the second "loose" step along a direct passing through points. As a result, movement along the "bottom of the ravine" to the point of the minimum is significantly accelerated.

More information about the problem of "ravines" and "ambition" methods can be found, for example, in ,.

3. Other approaches to determine the descent step.

It is not difficult to understand, on each iteration it would be desirable to choose the direction of the descent close to the direction, move along which it leads from point to point x. Unfortunately, antigigradite (is usually an unsuccessful descent direction. This is especially pronounced for ambition functions. Therefore, there is a doubt about the expediency of a thorough search for solving the task of one-dimensional minimization (10.23) and a desire to make in the direction only such a step would be provided " Significant descending "Functions Moreover, in practice, sometimes they are satisfied with the definition of a value that simply provides a decrease in the value of the target function.

As we have already noted, the task of optimization is the task of finding such values \u200b\u200bof factors. h. 1 = h. 1* , h. 2 = h. 2* , …, h. K. = h. K. * in which the response function ( w.) reaches extreme value w. \u003d EXT (Optimum).

Various methods for solving optimization problem are known. One of the most widely used is the gradient method, also called Boxing Wilson and the method of steep ascent.

Consider the essence of the gradient method on the example of a two-factor response function y \u003d.f (x. 1 , H. 2 ). In fig. 4.3 In the factorial space, the curves of equal values \u200b\u200bof the response function (level curves) are depicted. Point with coordinates h. 1 *, h. 2 * corresponds to the extremal value of the response function w. EXT.

If we choose any point of factor space as source ( h. 1 0 , h. 2 0), the shortest way to the top of the response function from this point is the path, by a curve, tangent to which at each point coincides with the normal to the level curve, i.e. This is the path in the direction of the gradient of the response function.

Gradient of continuous unambiguous function y \u003d.f.(x. 1 , H. 2) - this is a vector determined in the direction of gradient with coordinates:

where i,j. - single vectors in the direction of coordinate axes h. 1 I. h. 2. Private derivatives and characterize the direction of the vector.

Since we unknown a type of dependence y \u003d.f.(x. 1 , H. 2), we cannot find private derivatives, and determine the true direction of the gradient.

According to the gradient method in some part of the factor space, the initial point is selected (initial levels) h. 1 0 , h. twenty . Regarding these initial levels, a symmetric two-level experiment plan is built. Moreover, the variation interval is chosen so small that the linear model turns out to be adequate. It is known that any curve on a sufficiently small plot can be approximated by a linear model.

After constructing a symmetric two-level plan, an interpolation task is solved, i.e. Linear model is built:

and its adequacy is checked.

If a linear model has been adequate for the selected variation interval, then the direction of the gradient can be determined:

Thus, the direction of the gradient of the response function is determined by the values \u200b\u200bof the regression coefficients. This means that we will move in the direction of the gradient, if from the point with coordinates ( ) We turn to the point with the coordinates:

where m -a positive number that determines the step in the direction of the gradient.

Insofar as h. 1 0 \u003d 0 and h. 2 0 \u003d 0, then .

Having determined the direction of the gradient () and choosing the step m., carry out experience at the initial level H. 1 0 , h. 2 0 .


Then make a step in the direction of the gradient, i.e. We carry out experience at the point with coordinates. If the value of the response function has increased compared to its value at the initial level, we take another step in the direction of the gradient, i.e. We carry out experience at the point with coordinates:

Gradient movement continue until the response function begins to decrease. In fig. 4.3 Movement over a gradient corresponds to a direct emerging from the point ( h. 1 0 , h. twenty). It gradually deviates from the true direction of the gradient shown by the stroke line, due to the nonlinearity of the response function.

As soon as the response function, the value of the response function decreased, the gradient movement stops, take experience with the maximum value of the response function for the new initial level, make up a new symmetrical two-level plan and again solve the interpolation task.

Buing a new linear model , carry out regression analysis. If at the same time checking the significance of factors shows that at least one coeffic

the ficient, which means, the extremum area of \u200b\u200bthe response function (Optimum area) has not yet been reached. The new direction of the gradient is determined and movement to the optimum area begins.

Clarification of the direction of the gradient and the movement along the gradient continues until the process of solving the next interpolation problem will not show the validity of the factors that all factors are insignificant, i.e. everything . This means that the optimum area is achieved. This solution to the optimization task is stopped, and take experience with the maximum value of the response function for optimum.

In general, the sequence of actions necessary to solve the optimization problem by the method of the gradient can be represented as a flowchart (Fig. 4.4).

1) source levels of factors ( h. J. 0) It should be selected possible closer to the point of optimum, if there is some prior information about its position;

2) Variation intervals (Δ h. J.) It is necessary to choose such that the linear model certainly turned out to be adequate. The boundary below Δ. h. J. In this case, the minimum value of the variation interval, in which the response function remains significant;

3) step value ( t.) When moving along the gradient, choose so that the largest of the works did not exceed the difference in the upper and lower levels of factors in the normalized form

.

Hence, . With a smaller meaning t. The difference in the response function at the initial level and at the point with the coordinates may be insignificant. With a greater value of the step, it is dangerous to slip optimum response function.