Least squares theory briefly. Least square method

If some physical quantity depends on another quantity, then this dependence can be studied by measuring y at different values x . As a result of measurements, a series of values ​​is obtained:

x 1 , x 2 , ..., x i , ... , x n ;

y 1 , y 2 , ..., y i , ... , y n .

Based on the data of such an experiment, it is possible to plot the dependence y = ƒ(x). The resulting curve makes it possible to judge the form of the function ƒ(x). However constant coefficients, which are included in this function, remain unknown. The method allows you to determine them least squares. The experimental points, as a rule, do not lie exactly on the curve. The method of least squares requires that the sum of the squared deviations of the experimental points from the curve, i.e. 2 was the smallest.

In practice, this method is most often (and most simply) used in the case of a linear relationship, i.e. when

y=kx or y = a + bx.

Linear dependence is very widespread in physics. And even when the dependence is non-linear, they usually try to build a graph in such a way as to get a straight line. For example, if it is assumed that the refractive index of glass n is related to the wavelength λ of the light wave by the relation n = a + b/λ 2 , then the dependence of n on λ -2 is plotted on the graph.

Consider the dependence y=kx(straight line passing through the origin). Compose the value φ - the sum of the squared deviations of our points from the straight line

The value of φ is always positive and turns out to be the smaller, the closer our points lie to the straight line. The method of least squares states that for k one should choose such a value at which φ has a minimum


or
(19)

The calculation shows that the root-mean-square error in determining the value of k is equal to

, (20)
where – n is the number of measurements.

Let us now consider a somewhat more difficult case, when the points must satisfy the formula y = a + bx(a straight line not passing through the origin).

The task is to find the best values ​​of a and b from the given set of values ​​x i , y i .

Again we compose a quadratic form φ equal to the sum of the squared deviations of the points x i , y i from the straight line

and find the values ​​a and b for which φ has a minimum

;

.

.

The joint solution of these equations gives

(21)

The root-mean-square errors of determining a and b are equal

(23)

.  (24)

When processing the measurement results by this method, it is convenient to summarize all the data in a table in which all the sums included in formulas (19)–(24) are preliminarily calculated. The forms of these tables are shown in the examples below.

Example 1 The basic equation of the dynamics of rotational motion ε = M/J (a straight line passing through the origin) was studied. For various values ​​of the moment M, the angular acceleration ε of a certain body was measured. It is required to determine the moment of inertia of this body. The results of measurements of the moment of force and angular acceleration are listed in the second and third columns tables 5.

Table 5
n M, N m ε, s-1 M2 M ε ε - kM (ε - kM) 2
1 1.44 0.52 2.0736 0.7488 0.039432 0.001555
2 3.12 1.06 9.7344 3.3072 0.018768 0.000352
3 4.59 1.45 21.0681 6.6555 -0.08181 0.006693
4 5.90 1.92 34.81 11.328 -0.049 0.002401
5 7.45 2.56 55.5025 19.072 0.073725 0.005435
– – 123.1886 41.1115 – 0.016436

By formula (19) we determine:

.

To determine the root-mean-square error, we use formula (20)

0.005775kg-one · m -2 .

By formula (18) we have

; .

SJ = (2.996 0.005775)/0.3337 = 0.05185 kg m 2.

Given the reliability P = 0.95, according to the table of Student's coefficients for n = 5, we find t = 2.78 and determine the absolute error ΔJ = 2.78 0.05185 = 0.1441 ≈ 0.2 kg m 2.

We write the results in the form:

J = (3.0 ± 0.2) kg m 2;


Example 2 We calculate the temperature coefficient of resistance of the metal using the least squares method. Resistance depends on temperature according to a linear law

R t \u003d R 0 (1 + α t °) \u003d R 0 + R 0 α t °.

The free term determines the resistance R 0 at a temperature of 0 ° C, and the angular coefficient is the product of the temperature coefficient α and the resistance R 0 .

The results of measurements and calculations are given in the table ( see table 6).

Table 6
n t°, s r, Ohm t-¯t (t-¯t) 2 (t-¯t)r r-bt-a (r - bt - a) 2,10 -6
1 23 1.242 -62.8333 3948.028 -78.039 0.007673 58.8722
2 59 1.326 -26.8333 720.0278 -35.581 -0.00353 12.4959
3 84 1.386 -1.83333 3.361111 -2.541 -0.00965 93.1506
4 96 1.417 10.16667 103.3611 14.40617 -0.01039 107.898
5 120 1.512 34.16667 1167.361 51.66 0.021141 446.932
6 133 1.520 47.16667 2224.694 71.69333 -0.00524 27.4556
515 8.403 – 8166.833 21.5985 – 746.804
∑/n 85.83333 1.4005 – – – – –

By formulas (21), (22) we determine

R 0 = ¯ R- α R 0 ¯ t = 1.4005 - 0.002645 85.83333 = 1.1735 Ohm.

Let us find an error in the definition of α. Since , then by formula (18) we have:

.

Using formulas (23), (24) we have

;

0.014126 Ohm.

Given the reliability P = 0.95, according to the table of Student coefficients for n = 6, we find t = 2.57 and determine the absolute error Δα = 2.57 0.000132 = 0.000338 deg -1.

α = (23 ± 4) 10 -4 hail-1 at P = 0.95.


Example 3 It is required to determine the radius of curvature of the lens from Newton's rings. The radii of Newton's rings r m were measured and the numbers of these rings m were determined. The radii of Newton's rings are related to the radius of curvature of the lens R and the ring number by the equation

r 2 m = mλR - 2d 0 R,

where d 0 is the thickness of the gap between the lens and the plane-parallel plate (or lens deformation),

λ is the wavelength of the incident light.

λ = (600 ± 6) nm;
r 2 m = y;
m = x;
λR = b;
-2d 0 R = a,

then the equation will take the form y = a + bx.

.

The results of measurements and calculations are entered in table 7.

Table 7
n x = m y \u003d r 2, 10 -2 mm 2 m-¯m (m-¯m) 2 (m-¯m)y y-bx-a, 10-4 (y - bx - a) 2, 10 -6
1 1 6.101 -2.5 6.25 -0.152525 12.01 1.44229
2 2 11.834 -1.5 2.25 -0.17751 -9.6 0.930766
3 3 17.808 -0.5 0.25 -0.08904 -7.2 0.519086
4 4 23.814 0.5 0.25 0.11907 -1.6 0.0243955
5 5 29.812 1.5 2.25 0.44718 3.28 0.107646
6 6 35.760 2.5 6.25 0.894 3.12 0.0975819
21 125.129 – 17.5 1.041175 – 3.12176
∑/n 3.5 20.8548333 – – – – –

Least square method

Least square method ( MNK, OLS, Ordinary Least Squares) - one of the basic methods of regression analysis for estimating unknown parameters of regression models from sample data. The method is based on minimizing the sum of squares of regression residuals.

It should be noted that the least squares method itself can be called a method for solving a problem in any area if the solution consists of or satisfies a certain criterion for minimizing the sum of squares of some functions of the unknown variables. Therefore, the least squares method can also be used for an approximate representation (approximation) given function other (simpler) functions, when finding a set of quantities that satisfy equations or restrictions, the number of which exceeds the number of these quantities, etc.

The essence of the MNC

Let some (parametric) model of probabilistic (regression) dependence between the (explained) variable y and many factors (explanatory variables) x

where is the vector of unknown model parameters

- Random model error.

Let there also be sample observations of the values ​​of the indicated variables. Let be the observation number (). Then are the values ​​of the variables in the -th observation. Then, for given values ​​of the parameters b, it is possible to calculate the theoretical (model) values ​​of the explained variable y:

The value of the residuals depends on the values ​​of the parameters b.

The essence of LSM (ordinary, classical) is to find such parameters b for which the sum of the squares of the residuals (eng. Residual Sum of Squares) will be minimal:

In the general case, this problem can be solved by numerical methods of optimization (minimization). In this case, one speaks of nonlinear least squares(NLS or NLLS - English. Non Linear Least Squares). In many cases, an analytical solution can be obtained. To solve the minimization problem, it is necessary to find the stationary points of the function by differentiating it with respect to the unknown parameters b, equating the derivatives to zero, and solving the resulting system of equations:

If the random errors of the model are normally distributed, have the same variance, and are not correlated with each other, the least squares parameter estimates are the same as the maximum likelihood method (MLM) estimates.

LSM in the case of a linear model

Let the regression dependence be linear:

Let be y- column vector of observations of the explained variable, and - matrix of observations of factors (rows of the matrix - vectors of factor values ​​in a given observation, by columns - vector of values ​​of a given factor in all observations). The matrix representation of the linear model has the form:

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal to

accordingly, the sum of the squares of the regression residuals will be equal to

Differentiating this function with respect to the parameter vector and equating the derivatives to zero, we obtain a system of equations (in matrix form):

.

The solution of this system of equations gives the general formula for the least squares estimates for the linear model:

For analytical purposes, the last representation of this formula turns out to be useful. If the data in the regression model centered, then in this representation the first matrix has the meaning of a sample covariance matrix of factors, and the second one is the vector of covariances of factors with a dependent variable. If, in addition, the data is also normalized at the SKO (that is, ultimately standardized), then the first matrix has the meaning of the sample correlation matrix of factors, the second vector - the vector of sample correlations of factors with the dependent variable.

An important property of LLS estimates for models with a constant- the line of the constructed regression passes through the center of gravity of the sample data, that is, the equality is fulfilled:

In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of a single parameter (the constant itself) is equal to the mean value of the variable being explained. That is, the arithmetic mean, known for its good properties from the laws of large numbers, is also an least squares estimate - it satisfies the criterion for the minimum sum of squared deviations from it.

Example: simple (pairwise) regression

In the case of a steam room linear regression calculation formulas are simplified (you can do without matrix algebra):

Properties of OLS estimates

First of all, we note that for linear models, the least squares estimates are linear estimates, as follows from the above formula. For unbiased least squares estimators, it is necessary and sufficient that essential condition regression analysis: conditional on the factors, the mathematical expectation of a random error must be equal to zero. This condition is satisfied, in particular, if

  1. expected value random errors is zero, and
  2. factors and random errors are independent random variables.

The second condition - the condition of exogenous factors - is fundamental. If this property is not satisfied, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even very large volume data does not allow to obtain qualitative estimates in this case). In the classical case, a stronger assumption is made about the determinism of factors, in contrast to a random error, which automatically means that the exogenous condition is satisfied. In the general case, for the consistency of the estimates, it is sufficient to fulfill the exogeneity condition together with the convergence of the matrix to some non-singular matrix with an increase in the sample size to infinity.

In order for, in addition to consistency and unbiasedness, the (ordinary) least squares estimates to be also effective (the best in the class of linear unbiased estimates), additional properties of a random error must be satisfied:

These assumptions can be formulated for the covariance matrix of the random error vector

A linear model that satisfies these conditions is called classical. OLS estimators for classical linear regression are unbiased, consistent, and the most efficient estimators in the class of all linear unbiased estimators (in English literature, the abbreviation is sometimes used blue (Best Linear Unbaised Estimator) is the best linear unbiased estimate; in domestic literature, the Gauss-Markov theorem is more often cited). As it is easy to show, the covariance matrix of the coefficient estimates vector will be equal to:

Generalized least squares

The method of least squares allows for a wide generalization. Instead of minimizing the sum of squares of the residuals, one can minimize some positive definite quadratic form of the residual vector , where is some symmetric positive definite weight matrix. Ordinary least squares is a special case of this approach, when the weight matrix is ​​proportional to identity matrix. As is known from the theory of symmetric matrices (or operators), there is a decomposition for such matrices. Therefore, the specified functional can be represented as follows, that is, this functional can be represented as the sum of the squares of some transformed "residuals". Thus, we can distinguish a class of least squares methods - LS-methods (Least Squares).

It is proved (Aitken's theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are estimates of the so-called. generalized OLS (OMNK, GLS - Generalized Least Squares)- LS-method with a weight matrix equal to the inverse covariance matrix of random errors: .

It can be shown that the formula for the GLS-estimates of the parameters of the linear model has the form

The covariance matrix of these estimates, respectively, will be equal to

In fact, the essence of the OLS lies in a certain (linear) transformation (P) of the original data and the application of the usual least squares to the transformed data. The purpose of this transformation is that for the transformed data, the random errors already satisfy the classical assumptions.

Weighted least squares

In the case of a diagonal weight matrix (and hence the covariance matrix of random errors), we have the so-called weighted least squares (WLS - Weighted Least Squares). In this case, the weighted sum of squares of the residuals of the model is minimized, that is, each observation receives a "weight" that is inversely proportional to the variance of the random error in this observation: . In fact, the data is transformed by weighting the observations (dividing by an amount proportional to the assumed standard deviation of the random errors), and normal least squares is applied to the weighted data.

Some special cases of application of LSM in practice

Linear Approximation

Consider the case when, as a result of studying the dependence of a certain scalar quantity on a certain scalar quantity (This can be, for example, the dependence of voltage on current strength: , where is a constant value, the resistance of the conductor), these quantities were measured, as a result of which the values ​​\u200b\u200band were obtained their corresponding values. Measurement data should be recorded in a table.

Table. Measurement results.

Measurement No.
1
2
3
4
5
6

The question is: what value of the coefficient can be chosen so that the best way describe addiction? According to the least squares, this value should be such that the sum of the squared deviations of the values ​​from the values

was minimal

The sum of squared deviations has one extremum - a minimum, which allows us to use this formula. Let's find the value of the coefficient from this formula. To do this, we transform its left side as follows:

The last formula allows us to find the value of the coefficient , which was required in the problem.

Story

Until the beginning of the XIX century. scientists did not have certain rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, particular methods were used, depending on the type of equations and on the ingenuity of calculators, and therefore different calculators, starting from the same observational data, came to different conclusions. Gauss (1795) is credited with the first application of the method, and Legendre (1805) independently discovered and published it under its modern name (fr. Methode des moindres quarres ) . Laplace related the method to the theory of probability, and the American mathematician Adrain (1808) considered its probabilistic applications. The method is widespread and improved by further research by Encke, Bessel, Hansen and others.

Alternative use of MNCs

The idea of ​​the least squares method can also be used in other cases not directly related to regression analysis. The fact is that the sum of squares is one of the most common proximity measures for vectors (the Euclidean metric in finite-dimensional spaces).

One of the applications is "solution" of systems linear equations, in which the number of equations is greater than the number of variables

where the matrix is ​​not square, but rectangular.

Such a system of equations, in the general case, has no solution (if the rank is actually greater than the number of variables). Therefore, this system can be "solved" only in the sense of choosing such a vector in order to minimize the "distance" between the vectors and . To do this, you can apply the criterion for minimizing the sum of squared differences of the left and right parts system equations, that is. It is easy to show that the solution of this minimization problem leads to the solution of the following system of equations

3. Approximation of functions using the method

least squares

The least squares method is used when processing the results of the experiment for approximations (approximations) experimental data analytical formula. The specific form of the formula is chosen, as a rule, from physical considerations. These formulas can be:

other.

The essence of the least squares method is as follows. Let the measurement results be presented in the table:

Table 4

x n

y n

(3.1)

where f is a known function, a 0 , a 1 , …, a m - unknown constant parameters, the values ​​of which must be found. In the least squares method, the approximation of function (3.1) to the experimental dependence is considered to be the best if the condition

(3.2)

i.e amounts a squared deviations of the desired analytical function from the experimental dependence should be minimal .

Note that the function Q called inviscid.


Since the discrepancy

then it has a minimum. A necessary condition for the minimum of a function of several variables is the equality to zero of all partial derivatives of this function with respect to the parameters. Thus, finding the best values ​​of the parameters of the approximating function (3.1), that is, those values ​​for which Q = Q (a 0 , a 1 , …, a m ) is minimal, reduces to solving the system of equations:

(3.3)

The method of least squares can be given the following geometric interpretation: among an infinite family of lines of a given type, one line is found for which the sum of the squared differences in the ordinates of the experimental points and the corresponding ordinates of the points found by the equation of this line will be the smallest.

Finding the parameters of a linear function

Let the experimental data be represented by a linear function:

It is required to choose such values a and b , for which the function

(3.4)

will be minimal. The necessary conditions minima of function (3.4) are reduced to the system of equations:

After transformations, we obtain a system of two linear equations with two unknowns:

(3.5)

solving which , we find the desired values ​​of the parameters a and b .

Finding the parameters of a quadratic function

If the approximating function is a quadratic dependence

then its parameters a , b , c find from the minimum condition of the function:

(3.6)

The minimum conditions for the function (3.6) are reduced to the system of equations:


After transformations, we obtain a system of three linear equations with three unknowns:

(3.7)

at solving which we find the desired values ​​of the parameters a , b and c .

Example . Let the following table of values ​​be obtained as a result of the experiment x and y :

Table 5

y i

0,705

0,495

0,426

0,357

0,368

0,406

0,549

0,768

It is required to approximate the experimental data by linear and quadratic functions.

Decision. Finding the parameters of the approximating functions reduces to solving systems of linear equations (3.5) and (3.7). To solve the problem, we use a spreadsheet processor excel.

1. First we link sheets 1 and 2. Enter the experimental values x i and y i into columns A and B, starting from the second line (in the first line we put the column headings). Then we calculate the sums for these columns and put them in the tenth row.

In columns C–G place the calculation and summation respectively

2. Unhook the sheets. Further calculations will be carried out in a similar way for the linear dependence on Sheet 1 and for the quadratic dependence on Sheet 2.

3. Under the resulting table, we form a matrix of coefficients and a column vector of free members. Let's solve the system of linear equations according to the following algorithm:

To calculate inverse matrix and matrix multiplication, we use Master functions and functions MOBR and MUMNOZH.

4. In the cell block H2: H 9 based on the obtained coefficients, we calculate values ​​of the approximating polynomialy i calc., in block I 2: I 9 - deviations D y i = y i exp. - y i calc., in column J - the discrepancy:

Tables obtained and built using Chart Wizards graphs are shown in figures 6, 7, 8.


Rice. 6. Table for calculating the coefficients of a linear function,

approximating experimental data.


Rice. 7. Table for calculating the coefficients of a quadratic function,

approximatingexperimental data.


Rice. 8. Graphical representation of the results of the approximation

experimental data linear and quadratic functions.

Answer. The experimental data were approximated by the linear dependence y = 0,07881 x + 0,442262 with residual Q = 0,165167 and quadratic dependence y = 3,115476 x 2 – 5,2175 x + 2,529631 with residual Q = 0,002103 .

Tasks. Approximate the function given by tabular, linear and quadratic functions.

Table 6

№0

x

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

y

3,030

3,142

3,358

3,463

3,772

3,251

3,170

3,665

1

3,314

3,278

3,262

3,292

3,332

3,397

3,487

3,563

2

1,045

1,162

1,264

1,172

1,070

0,898

0,656

0,344

3

6,715

6,735

6,750

6,741

6,645

6,639

6,647

6,612

4

2,325

2,515

2,638

2,700

2,696

2,626

2,491

2,291

5

1.752

1,762

1,777

1,797

1,821

1,850

1,884

1,944

6

1,924

1,710

1,525

1,370

1,264

1,190

1,148

1,127

7

1,025

1,144

1,336

1,419

1,479

1,530

1,568

1,248

8

5,785

5,685

5,605

5,545

5,505

5,480

5,495

5,510

9

4,052

4,092

4,152

4,234

4,338

4,468

4,599

which finds the widest application in various fields of science and practical activities. It can be physics, chemistry, biology, economics, sociology, psychology and so on and so forth. By the will of fate, I often have to deal with the economy, and therefore today I will arrange for you a ticket to an amazing country called Econometrics=) … How do you not want that?! It's very good there - you just have to decide! …But what you probably definitely want is to learn how to solve problems least squares. And especially diligent readers will learn to solve them not only accurately, but also VERY FAST ;-) But first general statement of the problem+ related example:

Let indicators be studied in some subject area that have a quantitative expression. At the same time, there is every reason to believe that the indicator depends on the indicator. This assumption can be scientific hypothesis and be based on elementary common sense. Let's leave science aside, however, and explore more appetizing areas - namely, grocery stores. Denote by:

– retail space of a grocery store, sq.m.,
- annual turnover of a grocery store, million rubles.

It is quite clear what more area store, the greater its turnover in most cases.

Suppose that after conducting observations / experiments / calculations / dancing with a tambourine, we have at our disposal numerical data:

With grocery stores, I think everything is clear: - this is the area of ​​the 1st store, - its annual turnover, - the area of ​​the 2nd store, - its annual turnover, etc. By the way, it is not at all necessary to have access to classified materials - a fairly accurate assessment of the turnover can be obtained using mathematical statistics. However, do not be distracted, the course of commercial espionage is already paid =)

Tabular data can also be written in the form of points and depicted in the usual way for us. Cartesian system .

We will answer important question: how many points are needed for a qualitative study?

The bigger, the better. The minimum admissible set consists of 5-6 points. In addition, with a small amount of data, “abnormal” results should not be included in the sample. So, for example, a small elite store can help out orders of magnitude more than “their colleagues”, thereby distorting general pattern, which is to be found!

If it’s quite simple, we need to choose a function , schedule which passes as close as possible to the points . Such a function is called approximating (approximation - approximation) or theoretical function . Generally speaking, here immediately appears the obvious "applicant" - the polynomial high degree, whose graph passes through ALL points. But this option is complicated, and often simply incorrect. (because the chart will “wind” all the time and poorly reflect the main trend).

Thus, the desired function must be sufficiently simple and at the same time reflect the dependence adequately. As you might guess, one of the methods for finding such functions is called least squares. First, let's analyze its essence in general view. Let some function approximate the experimental data:


How to evaluate the accuracy of this approximation? Let us also calculate the differences (deviations) between the experimental and functional values (we study the drawing). The first thought that comes to mind is to estimate how big the sum is, but the problem is that the differences can be negative. (For example, ) and deviations as a result of such summation will cancel each other out. Therefore, as an estimate of the accuracy of the approximation, it suggests itself to take the sum modules deviations:

or in folded form: (suddenly, who doesn’t know: is the sum icon, and is an auxiliary variable-“counter”, which takes values ​​from 1 to ).

Approximating the experimental points with various functions, we will obtain different meanings, and obviously, where this sum is less, that function is more accurate.

Such a method exists and is called least modulus method. However, in practice it has become much more widespread. least square method, in which possible negative values ​​are eliminated not by the modulus, but by squaring the deviations:

, after which efforts are directed to the selection of such a function that the sum of the squared deviations was as small as possible. Actually, hence the name of the method.

And now we're back to another important point: as noted above, the selected function should be quite simple - but there are also many such functions: linear , hyperbolic, exponential, logarithmic, quadratic etc. And, of course, here I would immediately like to "reduce the field of activity." What class of functions to choose for research? Primitive but effective technique:

- The easiest way to draw points on the drawing and analyze their location. If they tend to be in a straight line, then you should look for straight line equation with optimal values ​​and . In other words, the task is to find SUCH coefficients - so that the sum of the squared deviations is the smallest.

If the points are located, for example, along hyperbole, then it is clear that the linear function will give a poor approximation. In this case, we are looking for the most “favorable” coefficients for the hyperbola equation - those that give the minimum sum of squares .

Now notice that in both cases we are talking about functions of two variables, whose arguments are searched dependency options:

And in essence, we need to solve a standard problem - to find minimum of a function of two variables.

Recall our example: suppose that the "shop" points tend to be located in a straight line and there is every reason to believe the presence linear dependence turnover from the trading area. Let's find SUCH coefficients "a" and "be" so that the sum of squared deviations was the smallest. Everything as usual - first partial derivatives of the 1st order. According to linearity rule you can differentiate right under the sum icon:

If you want to use this information for an essay or coursework, I will be very grateful for the link in the list of sources, you will not find such detailed calculations anywhere:

Let's compose standard system:

We reduce each equation by a “two” and, in addition, “break apart” the sums:

Note : independently analyze why "a" and "be" can be taken out of the sum icon. By the way, formally this can be done with the sum

Let's rewrite the system in an "applied" form:

after which the algorithm for solving our problem begins to be drawn:

Do we know the coordinates of the points? We know. Sums can we find? Easily. We compose the simplest system of two linear equations with two unknowns("a" and "beh"). We solve the system, for example, Cramer's method, resulting in a stationary point . Checking sufficient condition for an extremum, we can verify that at this point the function reaches precisely minimum. Verification is associated with additional calculations and therefore we will leave it behind the scenes. (if necessary, the missing frame can be viewed). We draw the final conclusion:

Function the best way (at least compared to any other linear function) brings experimental points closer . Roughly speaking, its graph passes as close as possible to these points. In tradition econometrics the resulting approximating function is also called paired linear regression equation .

The problem under consideration is of great practical importance. In the situation with our example, the equation allows you to predict what kind of turnover ("yig") will be at the store with one or another value of the selling area (one or another meaning of "x"). Yes, the resulting forecast will be only a forecast, but in many cases it will turn out to be quite accurate.

I will analyze just one problem with "real" numbers, since there are no difficulties in it - all calculations are at the level school curriculum 7-8 grade. In 95 percent of cases, you will be asked to find just a linear function, but at the very end of the article I will show that it is no more difficult to find the equations for the optimal hyperbola, exponent, and some other functions.

In fact, it remains to distribute the promised goodies - so that you learn how to solve such examples not only accurately, but also quickly. We carefully study the standard:

Task

As a result of studying the relationship between two indicators, the following pairs of numbers were obtained:

Using the least squares method, find the linear function that best approximates the empirical (experienced) data. Make a drawing on which, in a Cartesian rectangular coordinate system, plot experimental points and a graph of the approximating function . Find the sum of squared deviations between empirical and theoretical values. Find out if the function is better (in terms of the least squares method) approximate experimental points.

Note that "x" values ​​are natural values, and this has a characteristic meaningful meaning, which I will talk about a little later; but they, of course, can be fractional. In addition, depending on the content of a particular task, both "X" and "G" values ​​can be completely or partially negative. Well, we have been given a “faceless” task, and we start it decision:

We find the coefficients of the optimal function as a solution to the system:

For the purposes of a more compact notation, the “counter” variable can be omitted, since it is already clear that the summation is carried out from 1 to .

It is more convenient to calculate the required amounts in a tabular form:


Calculations can be carried out on a microcalculator, but it is much better to use Excel - both faster and without errors; watch a short video:

Thus, we get the following system:

Here you can multiply the second equation by 3 and subtract the 2nd from the 1st equation term by term. But this is luck - in practice, systems are often not gifted, and in such cases it saves Cramer's method:
, so the system has a unique solution.

Let's do a check. I understand that I don’t want to, but why skip mistakes where you can absolutely not miss them? Substitute the found solution into the left side of each equation of the system:

The right parts of the corresponding equations are obtained, which means that the system is solved correctly.

Thus, the desired approximating function: – from all linear functions experimental data is best approximated by it.

Unlike straight dependence of the store's turnover on its area, the found dependence is reverse (principle "the more - the less"), and this fact is immediately revealed by the negative angular coefficient. Function informs us that with an increase in a certain indicator by 1 unit, the value of the dependent indicator decreases average by 0.65 units. As they say, the higher the price of buckwheat, the less sold.

To plot the approximating function, we find two of its values:

and execute the drawing:


The constructed line is called trend line (namely, the line linear trend, i.e. in general, the trend is not necessarily a straight line). Everyone is familiar with the expression "to be in trend", and I think that this term does not need additional comments.

Calculate the sum of squared deviations between empirical and theoretical values. Geometrically, this is the sum of the squares of the lengths of the "crimson" segments (two of which are so small you can't even see them).

Let's summarize the calculations in a table:


They can again be carried out manually, just in case I will give an example for the 1st point:

but it's much more efficient to do in a certain way:

Let's repeat: what is the meaning of the result? From all linear functions function the exponent is the smallest, that is, it is the best approximation in its family. And here, by the way, the final question of the problem is not accidental: what if the proposed exponential function will it be better to approximate the experimental points?

Let's find the corresponding sum of squared deviations - to distinguish them, I will designate them with the letter "epsilon". The technique is exactly the same:


And again for every fire calculation for the 1st point:

In Excel we use standard function EXP (Syntax can be found in Excel Help).

Conclusion: , so the exponential function approximates the experimental points worse than the straight line .

But it should be noted here that "worse" is doesn't mean yet, what is wrong. Now I built a graph of this exponential function - and it also passes close to the points - so much so that without an analytical study it is difficult to say which function is more accurate.

This completes the solution, and I return to the question of the natural values ​​of the argument. In various studies, as a rule, economic or sociological, months, years or other equal time intervals are numbered with natural "X". Consider, for example, such a problem.

The essence of the method lies in the fact that the quality criterion of the solution under consideration is the sum of squared errors, which is sought to be minimized. To apply this, it is necessary to carry out as much as possible more measurements of the unknown random variable(the more - the higher the accuracy of the solution) and some set of proposed solutions, from which it is required to choose the best one. If the set of solutions is parameterized, then the optimal value of the parameters must be found.

Why are error squares minimized, and not errors themselves? The fact is that in most cases errors occur in both directions: the estimate can be greater than the measurement or less than it. If you add errors to different signs, then they will cancel each other out, and as a result, the sum will give us an incorrect idea of ​​the quality of the estimate. Often, in order for the final estimate to have the same dimension as the measured values, the square root is taken from the sum of squared errors.


A photo:

LSM is used in mathematics, in particular - in the theory of probability and mathematical statistics. This method has the greatest application in filtering problems, when it is necessary to separate the useful signal from the noise superimposed on it.

It is also used in mathematical analysis for an approximate representation of a given function by simpler functions. Another area of ​​application of LSM is the solution of systems of equations with fewer unknowns than the number of equations.

I came up with a few more very unexpected applications of the LSM, which I would like to talk about in this article.

MNCs and typos

Typos and spelling errors are the scourge of automatic translators and search engines. Indeed, if a word differs by only 1 letter, the program regards it as another word and translates/searchs for it incorrectly or does not translate/doesn't find it at all.

I had a similar problem: there were two databases with addresses of Moscow houses, and they had to be combined into one. But the addresses were written in different style. In one database there was the KLADR standard (All-Russian address classifier), for example: "BABUSHKINA PILOT UL., D10K3". And in another database there was a postal style, for example: “St. Pilot Babushkin, house 10 building 3. It seems that there are no errors in both cases, and automating the process is incredibly difficult (each database has 40,000 records!). Although there were enough typos too ... How to make the computer understand that the 2 addresses above belong to the same house? This is where MNC came in handy for me.

What I've done? Having found the next letter in the first address, I looked for the same letter in the second address. If they were both in the same place, then I assumed the error for that letter to be 0. If they were located in adjacent positions, then the error was 1. If there was a shift by 2 positions, the error was 2, and so on. If there was no such letter at all in the other address, then the error was assumed to be n+1, where n is the number of letters in the 1st address. Thus, I calculated the sum of squared errors and connected those records in which this sum was minimal.

Of course, the numbers of houses and buildings were processed separately. I don’t know if I invented another “bicycle”, or it really was, but the problem was solved quickly and efficiently. I wonder if this method is used in search engines? Perhaps it is used, since every self-respecting search engine, when meeting an unfamiliar word, offers a replacement from familiar words (“perhaps you meant ...”). However, they can do this analysis somehow differently.

OLS and search by pictures, faces and maps

This method can also be applied to search by pictures, drawings, maps, and even by people's faces.

A photo:

Now all search engines, instead of searching by images, in fact, use search by image captions. This is undoubtedly a useful and convenient service, but I propose to supplement it with a real image search.

A sample picture is introduced and a rating is made for all images by the sum of the squared deviations of the characteristic points. Determining these very characteristic points is in itself a non-trivial task. However, it is quite solvable: for example, for faces, these are the corners of the eyes, lips, the tip of the nose, nostrils, the edges and centers of the eyebrows, pupils, etc.

By comparing these parameters, you can find a face that is most similar to the sample. I have already seen sites where such a service works, and you can find a celebrity that is most similar to the photo you suggested, and even compose an animation that turns you into a celebrity and back. Surely the same method works in the bases of the Ministry of Internal Affairs, containing identikit images of criminals.

Photo: pixabay.com

Yes, and fingerprints can be searched in the same way. Map search focuses on the natural irregularities of geographical objects - the bends of rivers, mountain ranges, the outlines of coasts, forests and fields.

This is so wonderful and universal method MNK. I am sure that you, dear readers, will be able to find many unusual and unexpected applications of this method for yourself.