First order gradient method. Gradient unconstrained optimization techniques

The gradient method and its varieties are among the most common methods for finding the extremum of functions of several variables. The idea of ​​the gradient method is to move each time in the direction of the greatest increase in the objective function in the process of searching for an extremum (to determine the maximum).

The gradient method assumes the calculation of the first derivatives of the objective function by its arguments. It, like the previous ones, refers to approximate methods and allows, as a rule, not to reach the optimum point, but only to approach it in a finite number of steps.

Rice. 4.11.

Rice. 4.12.

(two-dimensional case)

First, select the starting point If in the one-dimensional case (see Subsection 4.2.6) from it it was possible

move only to the left or to the right (see Fig. 4.9), then in the multidimensional case the number of possible directions of movement is infinitely large. In fig. 4.11 illustrating the case of two variables, arrows starting from the starting point A, the various possible directions are shown. In this case, the movement along some of them gives an increase in the value of the objective function in relation to the point A(e.g. directions 1-3), and in other directions leads to its decrease (directions 5-8). Considering that the position of the optimum point is unknown, the direction in which the objective function increases the fastest is considered to be the best. This direction is called gradient functions. Note that at each point of the coordinate plane the direction of the gradient is perpendicular to the tangent to the level line drawn through the same point.

In mathematical analysis, it is proved that the components of the gradient vector of the function at =/(*, x 2, ..., x n) are its partial derivatives with respect to the arguments, i.e.

& hell / (x 1, x 2 ,.= (du / dxu, du / dx 2, ..., du / dx p). (4.20)

Thus, when searching for the maximum using the gradient method, at the first iteration, the components of the gradient are calculated using formulas (4.20) for the starting point and a working step is taken in the found direction, i.e. transition to a new point -0)

Y "with coordinates:

1§gas1 / (x (0)),

or in vector form

where X - constant or variable parameter that determines the length of the working step,? і> 0. At the second iteration, again calculate

gradient vector already for new point.Y, after which, by analogy

with a similar formula go to the point x ^ > etc. (fig. 4.12). For arbitrary To- th iteration we have

If not the maximum, but the minimum of the objective function is sought, then at each iteration a step is made in the direction opposite to the direction of the gradient. It is called the direction of the antigradient. Instead of formula (4.22), in this case there will be

There are many varieties of the gradient method, differing in the choice of the work step. You can, for example, go to each subsequent point at a constant value X, and then

working step length - the distance between adjacent points x ^

their 1 "- will be proportional to the modulus of the gradient vector. X such that the length of the working step remains constant.

Example. It is required to find the maximum of the function

y = 110-2 (x, -4) 2 -3 (* 2 -5) 2.

Of course, using necessary condition extremum, we immediately get the desired solution: NS ] - 4; x 2= 5. However, on this simple example it is convenient to demonstrate the algorithm of the gradient method. Let's calculate the gradient of the objective function:

grad y = (dy / dx-, dy / dx 2) =(4 (4 - *,); 6 (5 - x 2)) and select the starting point

A * "= (x) °> = 0; 4 °> = O).

The value of the objective function for this point, as it is easy to calculate, is equal to y [x ^ j = 3. Put X= const = 0.1. The magnitude of the gradient at a point

Зс (0) is equal to grad y | x ^ j = (16; 30). Then, at the first iteration, we obtain, according to formulas (4.21), the coordinates of the point

x 1)= 0 + 0.1 16 = 1.6; x ^ = 0 + 0.1 30 = 3.

y (x (1)) = 110 - 2 (1.6 - 4) 2 - 3 (3 - 5) 2 = 86.48.

As you can see, it is significantly higher than the previous value. At the second iteration, we have by formulas (4.22):

  • 1,6 + 0,1 4(4 - 1,6) = 2,56;

I'll throw in a little of my experience :)

Coordinate descent method

The idea of ​​this method is that the search occurs in the direction of the coordinate descent during a new iteration. The descent is carried out gradually along each coordinate. The number of coordinates directly depends on the number of variables.
To demonstrate the progress of this method, first you need to take the function z = f (x1, x2,…, xn) and select any point M0 (x10, x20,…, xn0) in n space, which depends on the number of characteristics of the function. The next step is to fix all points of the function into a constant, except for the very first one. This is done in order to reduce the search for multidimensional optimization to solving the search on a certain segment of the problem of one-dimensional optimization, that is, finding the argument x1.
To find the value of this variable, it is necessary to descend along this coordinate to a new point M1 (x11, x21,…, xn1). Then the function is differentiated and then we can find the value of the new next point using this expression:

After finding the value of the variable, it is necessary to repeat the iteration with fixing all arguments except x2 and start descending along the new coordinate to the next new point M2 (x11, x21, x30…, xn0). Now the value of the new point will be derived from the expression:

Again, the commit iteration will repeat until all arguments xi through xn run out. At the last iteration, we will sequentially go through all possible coordinates, in which we have already found local minima, so the objective function on the last coordinate will reach the global minimum. One of the advantages of this method is that at any moment of time it is possible to interrupt the descent and the last point found will be the minimum point. This is useful when the method goes into an infinite loop and the last found coordinate can be considered the result of this search. However, the target setting of the search for the global minimum in the area may not be achieved due to the fact that we interrupted the search for the minimum (see Figure 1).


Figure 1 - Canceling the execution of coordinate descent

The study of this method showed that each found calculated point in space is a global minimum point a given function, and the function z = f (x1, x2,…, xn) is convex and differentiable.
Hence, we can conclude that the function z = f (x1, x2, ..., xn) is convex and differentiable in space, and each found limit point in the sequence M0 (x10, x20, ..., xn0) will be a global minimum point (see. Figure 2) of this function using the coordinate descent method.


Figure 2 - Local minimum points on the coordinate axis

We can conclude that this algorithm copes well with simple tasks multidimensional optimization, by successively solving n number of one-dimensional optimization problems, for example, using the golden section method.

The progress of the coordinate descent method is performed according to the algorithm described in the block diagram (see Figure 3). Iterations of this method:
Initially, it is necessary to enter several parameters: the Epsilon precision, which must be strictly positive, the starting point x1 from which we will start the execution of our algorithm and set the Lambda j;
The next step is to take the first starting point x1, after which the ordinary one-dimensional equation with one variable is solved and the formula for finding the minimum will be, where k = 1, j = 1:

Now, after calculating the extremum point, it is necessary to check the number of arguments in the function and if j is less than n, then it is necessary to repeat the previous step and redefine the argument j = j + 1. In all other cases, go to the next step.
Now you need to redefine the variable x according to the formula x (k + 1) = y (n + 1) and try to perform the convergence of the function in the given precision by the expression:

Now the finding of the extremum point depends on this expression. If given expression is true, then the calculation of the extremum point is reduced to x * = xk + 1. But often it is necessary to perform additional iterations depending on the precision, so the values ​​of the arguments will be overridden by y (1) = x (k + 1), and the values ​​of the indices j = 1, k = k + 1.


Figure 3 - Block diagram of the coordinate descent method

In total, we have an excellent and multifunctional multidimensional optimization algorithm that is capable of breaking difficult task, into several sequentially iterative one-dimensional. Yes, this method is quite simple to implement and has easy definition of points in space, because this method guarantees convergence to a local minimum point. But even with such significant advantages, the method is capable of going into endless cycles due to the fact that it can fall into a kind of ravine.
There are gully functions in which depressions exist. The algorithm, having fallen into one of such depressions, can no longer get out and it will find the minimum point already there. Same way big number consecutive uses of the same one-dimensional optimization method can have a strong impact on weak computers. Not only is the convergence in this function very slow, since it is necessary to calculate all the variables and often a high specified accuracy increases the time of solving the problem by several times, but also the main disadvantage of this algorithm is its limited applicability.
While researching various algorithms for solving optimization problems, it should be noted that the quality of these algorithms plays a huge role. Also, do not forget such important characteristics as time and stability of execution, the ability to find the best values ​​that minimize or maximize the objective function, ease of implementation of solving practical problems. The coordinate descent method is easy to use, but in multidimensional optimization problems, more often than not, it is necessary to perform complex calculations, rather than dividing the whole problem into subtasks.

The Nelder - Mead Method

It is worth noting the popularity of this algorithm among researchers of multivariate optimization methods. The Nelder - Mead method is one of the few methods that is based on the concept of sequential transformation of a deformable simplex around the extremum point and does not use the algorithm of movement towards the global minimum.
This simplex is regular, but is represented as a polyhedron with equidistant vertices of the simplex in N-dimensional space... In different spaces, the simplex is mapped into an R2-equilateral triangle, and in R3 - a regular tetrahedron.
As mentioned above, the algorithm is a development of the Spendley, Hext and Hemsworth method of simplexes, but, unlike the latter, it allows the use of incorrect simplexes. Most often, a simplex means a convex polyhedron with N + 1 vertices, where N is the number of model parameters in n -dimensional space.
In order to start using this method, you need to determine the base vertex of all the available set of coordinates using the expression:

The most remarkable thing about this method is that the simplex has the ability to independently perform certain functions:
Reflection through the center of gravity, reflection with compression or extension;
Stretching;
Compression.
The advantage among these properties is given to reflection, since this parameter is the most optional - functional. From any selected vertex it is possible to make a reflection relative to the center of gravity of the simplex by the expression :.

Where xc is the center of gravity (see Figure 1).


Figure 1 - Reflection through the center of gravity

The next step is to calculate the arguments of the objective function at all vertices of the reflected simplex. After that, we get full information about how the simplex will behave in space, and hence information about the behavior of the function.
In order to search for the minimum or maximum point of the objective function using methods using simplexes, you must adhere to the following sequence:
At each step, a simplex is built, at each point of which, it is necessary to calculate all its vertices, and then sort the results obtained in ascending order;
The next step is reflection. It is necessary to make an attempt to obtain the values ​​of the new simplex, and by means of reflection, we will be able to get rid of unwanted values ​​that are trying to move the simplex not towards the global minimum;
To get the values ​​of the new simplex from the sorted results obtained, we take two vertices with worst values... There may be cases that immediately pick up suitable values fails, then you will have to return to the first step and compress the simplex to the point with the smallest value;
The end of the search for the extremum point is the center of gravity, provided that the value of the difference between the functions has the smallest values ​​at the points of the simplex.

The Nelder-Mead algorithm also uses these functions for working with a simplex according to the following formulas:

The reflection function through the center of gravity of the simplex is calculated using the following expression:

This reflection is performed strictly towards the extreme point and only through the center of gravity (see Figure 2).


Figure 2 - Reflection of the simplex occurs through the center of gravity

The function of squeezing into the inside of the simplex is calculated using the following expression:

In order to carry out the compression, it is necessary to determine the point with the smallest value (see Figure 3).


Figure 3 - Compression of the simplex occurs to the smallest argument.

The reflection function with compression of the simplex is calculated using the following expression:

In order to carry out reflection with compression (see Figure 4), it is necessary to remember the work of two separate functions - this reflection through the center of gravity and the compression of the simplex to the smallest value.


Figure 4 - Reflection with compression

The simplex stretch reflection function (see Figure 5) occurs using two functions - reflection through the center of gravity and stretch through the largest value.


Figure 5 - Reflection with stretching.

To demonstrate how the Nelder-Mead method works, refer to the flowchart (see Figure 6).
Primarily, as in the previous examples, you need to set the distortion parameter ε, which must be strictly greater than zero, and also set the necessary parameters for calculating α, β and a. This will be needed to calculate the function f (x0), as well as to construct the simplex itself.

Figure 6 - The first part of the Nelder-Mead method.

After constructing the simplex, it is necessary to calculate all the values ​​of the objective function. As described above about the search for an extremum using a simplex, it is necessary to calculate the function of the simplex f (x) at all its points. Next, we sort where the base point will be:

Now that the base point is calculated, as well as all the others are sorted in the list, we check the reachability condition by the accuracy we previously specified:

As soon as this condition becomes true, then the point x (0) of the simplex will be considered the desired extremum point. Otherwise, we go to the next step, where we need to determine the new value of the center of gravity using the formula:

If this condition is met, then the point x (0) will be the minimum point, otherwise, you need to go to the next step in which you need to search for the smallest argument of the function:

From the function it is necessary to get the smallest value of the argument in order to proceed to the next step in the execution of the algorithm. Sometimes there is a problem that several arguments at once have the same value calculated from the function. The solution to this problem is to redefine the argument value up to ten thousandths.
After re-calculating the minimum argument, it is necessary to re-store the new obtained values ​​at n positions of the arguments.


Figure 7 - The second part of the Nelder-Mead method.

The value calculated from the previous function must be substituted into the condition fmin< f(xN). При истинном выполнении данного условия, точка x(N) будет являться минимальной из группы тех, которые хранятся в отсортированном списке и нужно вернуться к шагу, где мы рассчитывали центр тяжести, в противном случае, производим сжатие симплекса в 2 раза и возвращаемся к самому началу с новым набором точек.
Studies of this algorithm show that methods with irregular simplexes (see Figure 8) are still poorly understood, but this does not prevent them from doing an excellent job with the assigned tasks.
Deeper tests show that one can experimentally select the parameters of the extension, compression, and reflection functions that are most suitable for the problem, but one can use the generally accepted parameters of these functions α = 1/2, β = 2, γ = 2 or α = 1/4, β = 5/2, γ = 2. Therefore, before discarding this method to solve the problem, it is necessary to understand that for each new search for an unconditional extremum, you need to closely observe the behavior of the simplex during its operation and note non-standard solutions method.


Figure 8 - The process of finding the minimum.

Statistics have shown that this algorithm has one of the most common problems - the degeneration of a deformable simplex. This happens when every time several vertices of the simplex fall into one space, the dimension of which does not satisfy the task at hand.
Thus, the dimension during operation and the given dimension throw several vertices of the simplex into one straight line, launching the method into an infinite loop. The algorithm in this modification is not yet equipped with a way to get out of this situation and shift one vertex to the side, so you have to create a new simplex with new parameters so that this does not happen in the future.
Another feature of this method is the incorrect operation with six or more vertices of the simplex. However, if you modify this method, you can get rid of this problem and not even lose the speed of execution, but the value of the allocated memory will noticeably increase. This method can be considered cyclical, since it is completely based on cycles, therefore, incorrect operation is noticed with a large number of vertices.
The Nelder - Mead algorithm can rightfully be considered one of the best practices finding the extremum point using a simplex and is excellent for using it in various kinds of engineering and economic problems. Even in spite of the cyclicity, it uses a very small amount of memory, in comparison with the same method of coordinate descent, and to find the extremum itself, it is required to calculate only the values ​​of the center of gravity and the function. A small, but sufficient, number of complex parameters give this method widespread use in complex mathematical and urgent production problems.
Simplex algorithms are an edge, the horizons of which we will not open soon, but already now they greatly simplify our life with their visual component.

P.S. The text is completely mine. I hope someone will find this information useful.

1. Gradient methods concept. A necessary condition for the existence of an extremum of a continuous differentiable function are conditions of the form

where are the function arguments. More compactly, this condition can be written in the form

(2.4.1)

where is the designation of the gradient of the function at a given point.

Optimization methods that use a gradient in determining the extremum of the objective function are called gradient. They are widely used in systems of optimal adaptive control of steady states, in which a search is made for the optimal (in the sense of the chosen criterion) steady state of the system when its parameters, structure, or external influences change.

Equation (2.4.1) is generally nonlinear. A direct solution to it is either impossible or very difficult. Finding solutions to this kind of equations is possible by organizing a special procedure for finding an extremum point, based on the use of various kinds of recurrent formulas.

The search procedure is built in the form of a multi-step process, in which each subsequent step leads to an increase or decrease in the objective function, i.e., the conditions are satisfied in the case of searching for the maximum and minimum, respectively:

Across n and n– 1 denotes the numbers of steps, and through and are vectors corresponding to the values ​​of the objective function arguments on n-m and ( NS- 1) th steps. After the rth step, you can get

i.e. after r - steps - the objective function will no longer increase (decrease) with any further change in its arguments; The latter means reaching a point with coordinates for which you can write that

(2.4.2)
(2.4.3)

where is the extreme value of the objective function.

To solve (2.4.1) in the general case, the following procedure can be applied. We write the value of the coordinates of the objective function in the form

where is some coefficient (scalar) that is not equal to zero.

At the extremum point since

The solution of equation (2.4.1) in this way is possible if the condition of convergence of the iterative process is satisfied for any initial value.

The determination methods based on the solution of equation (2.2.) Differ from each other by the choice, that is, by the choice of the step of changing the objective function in the process of searching for the extremum. This step can be permanent or variables In the second case, the law of variation of the step value, in turn, can be predetermined or. depend on the current value (may be non-linear).

2. Steepest Descent Method The idea of ​​the steepest descent method is that the search for an extremum should be carried out in the direction of the greatest change in the gradient or antigradient, since this is the shortest path to reach the extreme point. When implementing it, first of all, it is necessary to calculate the gradient at a given point and select the step value.

Computing the gradient. Since, as a result of optimization, the coordinates of the extremum point are found, for which the following relation is valid:

then the computational procedure for determining the gradient can be replaced by the procedure for determining the components of the gradients at discrete points of the objective function space

(2.4.5)

where is a small change in the coordinate

Assuming the gradient definition point is in the middle

segment then

The choice (2.4.5) or (2.4.6) depends on the steepness of the function on the section - Ax ;; if the steepness is not great, preference should be given to (2.4.5), since there are fewer calculations here; otherwise, a calculation by (2.4.4) gives more accurate results. Increasing the accuracy of determining the gradient is also possible by averaging random deviations.

Selecting a step value The difficulty in choosing the step value is that the direction of the gradient can change from point to point. In this case, too large a step will lead to a deviation from the optimal trajectory, i.e. from the direction along the gradient or antigradient, and too small a step will lead to very slow movement to the extremum due to the need to perform a large amount of calculations.

One of the possible methods for estimating the step value is the Newton - Raphson method. Let us consider it on the example of a one-dimensional case under the assumption that the extremum is reached at the point determined by the solution of the equation (Fig. 2.4.2).

Let the search begin from a point and in the vicinity of this point the function is expandable into a converging Taylor series. Then

The direction of the gradient at the point is the same as the direction of the tangent. When searching for the minimum extreme point, the coordinate change NS when moving along a gradient can be written as:

Fig. 2.4.2 Scheme for calculating the step by the Newton - Raphson method.

Substituting (2.4.7) into (2.4.8), we get:

Since by condition this example the value is reached at the point determined by the solution of the equation, then you can try to take such a step so that that is, to

Substitute the new value into the target function. If then at the point the determination procedure is repeated, as a result of which the value is found:



etc. the computation stops if the changes in the objective function are small, i.e.

where permissible error in determining the objective function.

Optimal gradient method. The idea behind this method is as follows. V the usual method the steepest descent step is chosen in the general case [when] arbitrarily, being guided only by the fact that it should not exceed a certain value. In the optimal gradient method, the step value is selected based on the requirement that from a given point in the direction of the gradient (anti-gradient) one should move until the objective function increases (decreases). If this requirement is not met, it is necessary to stop the movement and determine a new direction of movement (the direction of the gradient), etc. (until the optimal point is found).

Thus, the optimal values ​​and for finding the minimum and maximum, respectively, are determined from the solution of the equations:

In (1) and (2), respectively

Therefore, the determination at each step consists in finding from equations (1) or (2) for each point of the trajectory of motion along the gradient, starting from the initial one.

Relaxation method

The algorithm of the method consists in finding the axial direction along which the objective function decreases most strongly (when searching for the minimum). Consider the problem unconditional optimization

To determine the axial direction at the starting point of the search, the derivatives are determined from the area, with respect to all independent variables. The axial direction corresponds to the largest in absolute value derivative.

Let be the axial direction, i.e. .

If the sign of the derivative is negative, the function decreases in the direction of the axis, if positive, in the opposite direction:

The point is calculated. One step is made in the direction of decreasing the function, it is determined and, if the criterion is improved, the steps are continued until the minimum value in the selected direction is found. At this point, the derivatives with respect to all variables are again determined, except for those along which the descent is carried out. The axial direction of the most rapid decrease is found again, along which further steps are made, etc.

This procedure is repeated until the optimum point is reached, from which no further decrease occurs in any axial direction. In practice, the criterion for the end of the search is the condition

which at turns into an exact condition for the derivatives to be zero at the extremum point. Naturally, condition (3.7) can be used only if the optimum lies within the admissible range of variation of independent variables. If the optimum falls on the boundary of the region, a criterion of the type (3.7) is inapplicable and instead of it one should apply the positivity of all derivatives with respect to the admissible axial directions.

The descent algorithm for the selected axial direction can be written as follows

(3.8)

where is the value of the variable at each step of the descent;

The value of k + 1 step, which can vary depending on the step number:

- function of the z sign;

The vector of the point at which the derivatives were last calculated;



The “+” sign in the algorithm (3.8) is taken when searching for max I, and the “-” sign - when searching for min I. The smaller the step h., The more quantity calculations on the way to the optimum. But if the value of h is too large, near the optimum, a looping of the search process may occur. Near the optimum, it is necessary that the condition h

The simplest algorithm for changing the step h is as follows. At the beginning of the descent, a step is set equal to, for example, 10% of the d range; changes with this step, the descent is performed in the chosen direction until the condition for two subsequent calculations is satisfied

If the condition is violated at any step, the direction of descent on the axis is reversed and descent continues from the last point with the step size halved.

The formal notation for this algorithm is as follows:

(3.9)

As a result of using this strategy, the descent steps will decrease in the region of the optimum in this direction and the search in the direction can be stopped when E becomes less.

Then a new axial direction is found, the initial step for further descent, usually less than the one traveled along the previous axial direction. The nature of the movement at the optimum in this method is shown in Figure 3.4.

Figure 3.5 - Trajectory of movement to the optimum in the relaxation method

Improvement of the search algorithm by this method can be achieved by applying the methods of one-parameter optimization. In this case, a scheme for solving the problem can be proposed:

Step 1. - axial direction,

; , if ;

Step 2. - new axial direction;

Gradient method

This method uses a gradient function. Gradient function at point is called a vector, the projections of which on the coordinate axes are the partial derivatives of the function with respect to coordinates (Fig. 6.5)

Figure 3.6 - Gradient function

.

The direction of the gradient is the direction of the fastest increase in the function (the steepest “slope” of the response surface). The direction opposite to it (the direction of the antigradient) is the direction of the fastest decrease (the direction of the fastest "descent" of values).

The projection of the gradient onto the plane of variables is perpendicular to the tangent to the level line, i.e. the gradient is orthogonal to the lines of the constant level of the objective function (Fig. 3.6).

Figure 3.7 - Trajectory of movement to the optimum in the method

gradient

In contrast to the relaxation method, in the gradient method, steps are taken in the direction of the fastest decrease (increase) in the function.

The search for the optimum is carried out in two stages. At the first stage, the values ​​of the partial derivatives with respect to all variables are found, which determine the direction of the gradient at the point under consideration. At the second stage, a step is made in the direction of the gradient when looking for a maximum or in the opposite direction - when looking for a minimum.

If the analytical expression is unknown, then the direction of the gradient is determined by searching the object for trial movements. Let the starting point. The increment is given, while. Determine the increment and derivative

Derivatives with respect to the rest of the variables are determined in a similar way. After finding the components of the gradient, the test movements stop and work steps begin in the selected direction. Moreover, the larger the absolute value of the vector, the larger the step.

When a step is executed, the values ​​of all explanatory variables are simultaneously changed. Each of them gets an increment proportional to the corresponding component of the gradient

, (3.10)

or in vector form

, (3.11)

where is a positive constant;

“+” - when searching for max I;

“-” - when searching for min I.

The gradient search algorithm for normalizing the gradient (dividing by the modulus) is applied in the form

; (3.12)

(3.13)

Determines the step size in the direction of the gradient.

Algorithm (3.10) has the advantage that when approaching the optimum, the step length automatically decreases. And with the algorithm (3.12), the strategy of change can be built regardless of the absolute value of the coefficient.

In the gradient method, one working step is divided each, after which the derivatives are calculated again, a new direction of the gradient is determined and the search process continues (Fig. 3.5).

If the step size is chosen too small, then the movement to the optimum will be too long due to the need to calculate at very many points. If the step is chosen too large, a loop may occur to the optimum region.

The search process continues until,, become close to zero or until the boundary of the variable setting range is reached.

In an algorithm with automatic step refinement, the value is refined so that the change in the direction of the gradient at adjacent points and

Criteria for the end of the search for the optimum:

; (3.16)

; (3.17)

where Is the vector norm.

The search ends when one of the conditions (3.14) - (3.17) is fulfilled.

The disadvantage of gradient search (as well as the methods discussed above) is that when using it, you can only find the local extremum of the function. To find other local extrema, it is necessary to search from other starting points.

As we have already noted, the optimization problem is the problem of finding such factor values NS 1 = NS 1* , NS 2 = NS 2* , …, NSk = NSk * at which the response function ( at) reaches an extreme value at= ext (optimum).

There are various methods for solving the optimization problem. One of the most widely used is the gradient method, also called the Box-Wilson method and the steep ascent method.

Let's consider the essence of the gradient method using the example of a two-factor response function y =f (x 1 , NS 2 ). In fig. 4.3 curves of equal values ​​of the response function (level curves) are shown in the factor space. Point with coordinates NS 1 *, NS 2 * corresponds to the extreme value of the response function at ext.

If we choose any point of the factor space as the initial ( NS 1 0 , NS 2 0), then the shortest path to the vertex of the response function from this point is the path along the curve, the tangent to which at each point coincides with the normal to the level curve, i.e. this is the path in the direction of the gradient of the response function.

Gradient of continuous single-valued function y =f(x 1 , NS 2) is a vector defined in the direction by a gradient with coordinates:

where i,j- unit vectors in the direction of the coordinate axes NS 1 and NS 2. Partial derivatives characterize the direction of the vector.

Since we do not know the type of addiction y =f(x 1 , NS 2), we cannot find the partial derivatives, and determine the true direction of the gradient.

According to the gradient method, a starting point (initial levels) is selected in some part of the factor space NS 1 0 , NS twenty . A symmetric two-level design of the experiment is constructed with respect to these initial levels. Moreover, the variation interval is chosen so small that the linear model is adequate. It is known that any curve in a sufficiently small area can be approximated by a linear model.

After constructing a symmetric two-level plan, the interpolation problem is solved, i.e. a linear model is built:

and its adequacy is checked.

If for the selected interval of variation the linear model turned out to be adequate, then the direction of the gradient can be determined:

Thus, the direction of the gradient of the response function is determined by the values ​​of the regression coefficients. This means that we will move in the direction of the gradient if from a point with coordinates ( ) go to a point with coordinates:

where m - a positive number that specifies the step size in the direction of the gradient.

Insofar as NS 1 0 = 0 and NS 2 0 = 0, then .

Having determined the direction of the gradient () and choosing the step size m, we carry out the experience at the initial level NS 1 0 , NS 2 0 .


Then we take a step in the direction of the gradient, i.e. we carry out an experiment at a point with coordinates. If the value of the response function has increased in comparison with its value at the initial level, we take another step in the direction of the gradient, i.e. we carry out an experiment at a point with coordinates:

We continue to move along the gradient until the response function begins to decrease. In fig. 4.3 movement along the gradient corresponds to a straight line outgoing from the point ( NS 1 0 , NS twenty). It gradually deviates from the true direction of the gradient shown by the dashed line due to the non-linearity of the response function.

As soon as in the next experiment the value of the response function has decreased, the movement along the gradient is stopped, the experiment with the maximum value of the response function is taken as a new initial level, a new symmetric two-level plan is drawn up, and the interpolation problem is solved again.

By constructing a new linear model , carry out regression analysis. If, at the same time, checking the significance of the factors shows that at least one coefficient

This means that the region of the extremum of the response function (the region of the optimum) has not yet been reached. A new direction of the gradient is determined and movement towards the optimum region begins.

Refinement of the direction of the gradient and movement along the gradient continue until, in the process of solving the next interpolation problem, checking the significance of the factors shows that all factors are insignificant, i.e. all . This means that the optimum region has been reached. At this point, the solution to the optimization problem is stopped, and the experience with the maximum value of the response function is taken as the optimum.

In general, the sequence of actions required to solve the optimization problem by the gradient method can be presented in the form of a block diagram (Fig. 4.4).

1) initial levels of factors ( NSj 0) should be chosen as close as possible to the optimum point if there is some a priori information about its position;

2) intervals of variation (Δ NSj) should be chosen such that the linear model is likely to be adequate. The bottom boundary Δ NSj this is the minimum value of the variation interval at which the response function remains significant;

3) step value ( T) when moving along the gradient is chosen in such a way that the largest of the products does not exceed the difference between the upper and lower levels of the factors in the normalized form

.

Hence, . With a smaller value T the difference between the response function at the initial level and at the point with coordinates may turn out to be insignificant. With a larger value of the step, there is a danger of overshooting the optimum of the response function.