Least squares data. Linear Paired Regression Analysis

Which finds the widest application in various fields of science and practical activities... It can be physics, chemistry, biology, economics, sociology, psychology, and so on, and so on. By the will of fate, I often have to deal with the economy, and therefore today I will issue you a ticket to an amazing country called Econometrics=) ... How do you not want it ?! It's very good there - you just need to make up your mind! ... But what you probably definitely want is to learn how to solve problems method least squares ... And especially diligent readers will learn how to solve them not only faultlessly, but also VERY FAST ;-) But first general problem statement+ related example:

Let in some subject area the indicators are investigated that have a quantitative expression. At the same time, there is every reason to believe that the indicator depends on the indicator. This belief can be like scientific hypothesis, and be based on elementary common sense. Leaving science aside, however, and exploring more mouth-watering areas - namely grocery stores. Let us denote by:

- retail space of a grocery store, sq.m.,
- annual turnover of the grocery store, million rubles.

It is quite clear that what larger area store, the more its turnover will be in most cases.

Suppose that after observing / experimenting / calculating / dancing with a tambourine, we have numerical data at our disposal:

With grocery stores, I think everything is clear: - this is the area of ​​the 1st store, - its annual turnover, - the area of ​​the 2nd store, - its annual turnover, etc. By the way, it is not at all necessary to have access to classified materials - a fairly accurate estimate of the turnover can be obtained by means of mathematical statistics... However, let's not be distracted, the course of commercial espionage - it is already paid =)

Tabular data can also be written in the form of dots and depicted in the usual for us Cartesian system .

We will answer to important question: how many points do you need for a qualitative study?

The bigger, the better. The minimum allowable set consists of 5-6 points. In addition, with a small amount of data, the sample cannot include “anomalous” results. So, for example, a small elite store can help out orders of magnitude more "its colleagues", thereby distorting general pattern, which you want to find!

To put it quite simply - we need to choose a function, schedule which passes as close as possible to the points ... This function is called approximating (approximation - approximation) or theoretical function ... Generally speaking, an obvious "candidate" immediately appears here - the polynomial high degree, the graph of which passes through ALL points. But this option is difficult, and often simply incorrect. (since the chart will be “twisting” all the time and reflecting poorly the main trend).

Thus, the sought function should be simple enough and at the same time reflect the dependence adequately. As you might guess, one of the methods for finding such functions is called least squares method... First, let's analyze its essence in general view... Let some function approximate the experimental data:


How to evaluate the accuracy of this approximation? Let us calculate the differences (deviations) between the experimental and functional values (studying the drawing)... The first thought that comes to mind is to estimate how large the sum is, but the problem is that the differences can be negative. (for example, ) and deviations as a result of such summation will cancel each other out. Therefore, as an estimate of the accuracy of the approximation, it begs to accept the sum modules deviations:

or collapsed: (suddenly, who does not know: - this is the sum icon, and - an auxiliary variable - "counter", which takes values ​​from 1 to).

Approximating the experimental points with various functions, we will obtain different meanings, and it is obvious where this sum is less - that function is more accurate.

Such a method exists and it is called least modulus method... However, in practice, it has become much more widespread. least square method, in which possible negative values ​​are eliminated not by the modulus, but by squaring the deviations:

, after which efforts are directed to the selection of such a function so that the sum of the squares of the deviations was as small as possible. Actually, hence the name of the method.

And now we're going back to another important point: as noted above, the selected function should be quite simple - but there are also many such functions: linear , hyperbolic, exponential, logarithmic, quadratic etc. And, of course, here I would immediately like to "reduce the field of activity." Which class of functions to choose for research? A primitive but effective trick:

- The easiest way to draw points on the drawing and analyze their location. If they tend to be in a straight line, then you should look for equation of a straight line with optimal values ​​and. In other words, the task is to find SUCH coefficients - so that the sum of the squares of the deviations is the smallest.

If the points are located, for example, along hyperbole, then it is a priori clear that a linear function will give a bad approximation. In this case, we are looking for the most "favorable" coefficients for the hyperbola equation - those that give the minimum sum of squares .

Now, note that in both cases we are talking about functions of two variables whose arguments are parameters of wanted dependencies:

And in essence, we need to solve a standard problem - to find minimum function of two variables.

Let's remember our example: suppose that the "store" points tend to be located in a straight line and there is every reason to believe that linear relationship turnover from the retail space. Let's find SUCH coefficients "a" and "bs" so that the sum of the squares of the deviations was the smallest. Everything is as usual - first 1st order partial derivatives... According to linearity rule you can differentiate directly under the amount icon:

If you want to use this information for an essay or course book, I will be very grateful for the link in the list of sources, you will find such detailed calculations in few places:

Let's compose standard system:

We reduce each equation by "two" and, in addition, "break up" the sums:

Note : Analyze on your own why “a” and “bie” can be taken out for the sum icon. By the way, formally this can be done with the sum

Let's rewrite the system in an "applied" form:

after which the algorithm for solving our problem begins to be drawn:

Do we know the coordinates of the points? We know. Amounts can we find? Easily. We compose the simplest system of two linear equations in two unknowns("A" and "bh"). We solve the system, for example, Cramer's method, as a result of which we obtain a stationary point. Checking sufficient condition for extremum, one can make sure that at this point the function achieves exactly minimum... Verification is associated with additional calculations and therefore we will leave it behind the scenes. (if necessary, the missing frame can be viewed)... We draw the final conclusion:

Function the best way (at least compared to any other linear function) brings experimental points closer ... Roughly speaking, its graph runs as close as possible to these points. In tradition econometrics the resulting approximating function is also called paired linear regression equation .

The problem under consideration is of great practical importance. In the situation with our example, the equation allows you to predict what turnover ("Game") will be at the store with one or another value of the retail space (this or that value "x")... Yes, the forecast obtained will be only a forecast, but in many cases it will be quite accurate.

I will analyze just one problem with "real" numbers, since there are no difficulties in it - all calculations are at the level school curriculum 7-8 grades. In 95 percent of cases, you will be asked to find just a linear function, but at the very end of the article I will show that it is no more difficult to find the equations of the optimal hyperbola, exponent and some other functions.

In fact, it remains to hand out the promised buns - so that you learn how to solve such examples not only accurately, but also quickly. We carefully study the standard:

Task

As a result of studying the relationship between the two indicators, the following pairs of numbers were obtained:

Using the least squares method, find the linear function that best approximates the empirical (experienced) data. Make a drawing on which, in a Cartesian rectangular coordinate system, plot experimental points and a graph of the approximating function ... Find the sum of the squares of the deviations between empirical and theoretical values. Figure out if the function would be better (from the point of view of the method of least squares) zoom in on experimental points.

Note that the “x” meanings are natural, and this has a characteristic meaningful meaning, which I will talk about a little later; but they, of course, can be fractional. In addition, depending on the content of a particular problem, both "x" and "game" values ​​can be fully or partially negative. Well, we have a “faceless” task, and we start it solution:

We find the coefficients of the optimal function as a solution to the system:

For the sake of a more compact notation, the "counter" variable can be omitted, since it is already clear that the summation is carried out from 1 to.

It is more convenient to calculate the required amounts in a tabular form:


Calculations can be carried out on a microcalculator, but it is much better to use Excel - both faster and without errors; watch a short video:

Thus, we obtain the following the system:

Here you can multiply the second equation by 3 and subtract the 2nd from the 1st equation term-by-term... But this is luck - in practice, systems are often not a gift, and in such cases it saves Cramer's method:
, which means that the system has a unique solution.

Let's check. I understand that I don’t want to, but why skip errors where they can be completely avoided? We substitute the found solution into the left side of each equation of the system:

The right-hand sides of the corresponding equations are obtained, which means that the system is solved correctly.

Thus, the required approximating function: - from of all linear functions it is she who approximates the experimental data in the best way.

Unlike straight dependence of the turnover of the store on its area, the dependence found is reverse (the principle "the more - the less"), and this fact is immediately revealed by the negative slope... Function informs us that with an increase in a certain indicator by 1 unit, the value of the dependent indicator decreases average by 0.65 units. As the saying goes, the higher the price of buckwheat, the less it is sold.

To plot the graph of the approximating function, we find two of its values:

and execute the drawing:


The constructed line is called trend line (namely, the line linear trend, i.e. in general, a trend is not necessarily a straight line)... Everyone is familiar with the expression "be in trend", and I think that this term does not need additional comments.

Let's calculate the sum of the squares of the deviations between empirical and theoretical values. Geometrically, it is the sum of the squares of the lengths of the "crimson" segments (two of which are so small that you can't even see them).

Let's summarize the calculations in a table:


They can again be done manually, just in case I will give an example for the 1st point:

but it is much more effective to act already in a known way:

Let's repeat: what is the meaning of the obtained result? From of all linear functions function the indicator is the smallest, that is, in its family it is the best approximation. And here, by the way, the final question of the problem is not accidental: what if the proposed exponential function will it be better to approximate the experimental points?

Let's find the corresponding sum of squares of deviations - in order to distinguish, I will designate them with the letter "epsilon". The technique is exactly the same:


And again, just for every fireman, calculations for the 1st point:

In Excel we use standard function EXP (see the Excel Help for the syntax).

Output:, which means that the exponential function approximates the experimental points worse than the straight line .

But here it should be noted that "worse" is does not mean yet, what is wrong. Now I have plotted this exponential function - and it also goes close to the points - so much so that without analytical research it is difficult to say which function is more accurate.

This completes the solution, and I return to the question of the natural values ​​of the argument. In various studies, as a rule, economic or sociological, natural "xes" number months, years or other equal time intervals. Consider, for example, a problem like this.

If some physical quantity depends on another quantity, then this dependence can be investigated by measuring y at different meanings x. As a result of measurements, a number of values ​​are obtained:

x 1, x 2, ..., x i, ..., x n;

y 1, y 2, ..., y i, ..., y n.

Based on the data of such an experiment, it is possible to construct a graph of the dependence y = ƒ (x). The resulting curve makes it possible to judge the form of the function ƒ (x). but constant coefficients which are included in this function remain unknown. The method of least squares allows you to determine them. Experimental points, as a rule, do not fit exactly on the curve. The least squares method requires that the sum of the squares of the deviations of the experimental points from the curve, i.e. 2 was the smallest.

In practice, this method is most often (and most simply) used in the case of a linear relationship, i.e. when

y = kx or y = a + bx.

Linear dependence is very widespread in physics. And even when the relationship is non-linear, they usually try to plot the graph in such a way as to get a straight line. For example, if it is assumed that the refractive index of glass n is related to the length λ of the light wave by the ratio n = a + b / λ 2, then the dependence of n on λ -2 is plotted on the graph.

Consider the dependency y = kx(straight line through the origin). Let's compose the value φ - the sum of the squares of the deviations of our points from the straight line

The value of φ is always positive and turns out to be the smaller, the closer our points lie to the straight line. The least squares method states that for k one should choose such a value at which φ has a minimum


or
(19)

The calculation shows that the root-mean-square error in determining the value of k is equal to

, (20)
where - n is the number of measurements.

Let us now consider a slightly more difficult case, when the points must satisfy the formula y = a + bx(straight line not passing through the origin).

The task is to find the best values ​​of a and b from the available set of values ​​x i, y i.

Again, we compose the quadratic form φ, equal to the sum of the squares of the deviations of the points x i, y i from the straight line

and find the values ​​of a and b for which φ has a minimum

;

.

.

The joint solution of these equations gives

(21)

The root-mean-square errors in determining a and b are equal

(23)

... & nbsp (24)

When processing measurement results by this method, it is convenient to summarize all data in a table in which all the sums included in formulas (19) - (24) are preliminarily calculated. The forms of these tables are shown in the examples discussed below.

Example 1. The basic equation of the dynamics of rotational motion ε = M / J (a straight line passing through the origin of coordinates) was investigated. For various values ​​of the moment M, the angular acceleration ε of a certain body was measured. It is required to determine the moment of inertia of this body. The results of measurements of the moment of force and angular acceleration are entered in the second and third columns. table 5.

Table 5
n M, Nm ε, s -1 M 2 M ε ε - kM (ε - kM) 2
1 1.44 0.52 2.0736 0.7488 0.039432 0.001555
2 3.12 1.06 9.7344 3.3072 0.018768 0.000352
3 4.59 1.45 21.0681 6.6555 -0.08181 0.006693
4 5.90 1.92 34.81 11.328 -0.049 0.002401
5 7.45 2.56 55.5025 19.072 0.073725 0.005435
– – 123.1886 41.1115 – 0.016436

Using the formula (19), we determine:

.

To determine the mean square error, we use the formula (20)

0.005775kg-1 · m -2 .

By formula (18), we have

; .

S J = (2.996 0.005775) /0.3337 = 0.05185 kg m 2.

Setting the reliability P = 0.95, according to the table of Student's coefficients for n = 5, we find t = 2.78 and determine absolute errorΔJ = 2.78 0.05185 = 0.1441 ≈ 0.2 kg m 2.

We will write the results in the form:

J = (3.0 ± 0.2) kg m 2;


Example 2. Let's calculate the temperature coefficient of resistance of the metal using the least squares method. Resistance is linear with temperature

R t = R 0 (1 + α t °) = R 0 + R 0 α t °.

The free term defines the resistance R 0 at a temperature of 0 ° C, and slope- the product of the temperature coefficient α by the resistance R 0.

The results of measurements and calculations are shown in the table ( see table 6).

Table 6
n t °, s r, Ohm t-¯ t (t-¯t) 2 (t-¯ t) r r - bt - a (r - bt - a) 2, 10 -6
1 23 1.242 -62.8333 3948.028 -78.039 0.007673 58.8722
2 59 1.326 -26.8333 720.0278 -35.581 -0.00353 12.4959
3 84 1.386 -1.83333 3.361111 -2.541 -0.00965 93.1506
4 96 1.417 10.16667 103.3611 14.40617 -0.01039 107.898
5 120 1.512 34.16667 1167.361 51.66 0.021141 446.932
6 133 1.520 47.16667 2224.694 71.69333 -0.00524 27.4556
515 8.403 – 8166.833 21.5985 – 746.804
∑ / n 85.83333 1.4005 – – – – –

Using formulas (21), (22), we determine

R 0 = ¯ R- α R 0 ¯ t = 1.4005 - 0.002645 85.83333 = 1.1735 Ohm.

Let us find the error in the definition of α. Since, then by formula (18) we have:

.

Using formulas (23), (24), we have

;

0.014126 Ohm.

Given the reliability P = 0.95, according to the table of Student's coefficients for n = 6, we find t = 2.57 and determine the absolute error Δα = 2.57 0.000132 = 0.000338 deg -1.

α = (23 ± 4) · 10 -4 hail-1 at P = 0.95.


Example 3. It is required to determine the radius of curvature of the lens using Newton's rings. The radii of Newton's rings r m were measured and the numbers of these rings m were determined. The radii of Newton's rings are related to the radius of curvature of the lens R and the number of the ring by the equation

r 2 m = mλR - 2d 0 R,

where d 0 is the thickness of the gap between the lens and the plane-parallel plate (or lens deformation),

λ is the wavelength of the incident light.

λ = (600 ± 6) nm;
r 2 m = y;
m = x;
λR = b;
-2d 0 R = a,

then the equation takes the form y = a + bx.

.

The results of measurements and calculations are recorded in Table 7.

Table 7
n x = m y = r 2, 10 -2 mm 2 m -¯ m (m -¯ m) 2 (m -¯ m) y y - bx - a, 10 -4 (y - bx - a) 2, 10 -6
1 1 6.101 -2.5 6.25 -0.152525 12.01 1.44229
2 2 11.834 -1.5 2.25 -0.17751 -9.6 0.930766
3 3 17.808 -0.5 0.25 -0.08904 -7.2 0.519086
4 4 23.814 0.5 0.25 0.11907 -1.6 0.0243955
5 5 29.812 1.5 2.25 0.44718 3.28 0.107646
6 6 35.760 2.5 6.25 0.894 3.12 0.0975819
21 125.129 – 17.5 1.041175 – 3.12176
∑ / n 3.5 20.8548333 – – – – –

Least square method is used to estimate the parameters of the regression equation.
Number of lines (initial data)

One of the methods for studying stochastic relationships between features is regression analysis.
Regression analysis is the derivation of the regression equation, which is used to find average value a random variable (feature-result), if the value of another (or other) variables (feature-factors) is known. It includes the following steps:

  1. choice of the form of communication (type of analytical regression equation);
  2. estimation of the parameters of the equation;
  3. assessment of the quality of the analytical regression equation.
Most often, a linear form is used to describe the statistical relationship of features. Attention to the linear relationship is explained by a clear economic interpretation of its parameters, limited variation of variables and the fact that in most cases nonlinear forms of communication for performing calculations are converted (by logarithm or change of variables) into a linear form.
In the case of a linear pairwise relationship, the regression equation will take the form: y i = a + b x i + u i. Options this equation a and b are estimated from the data of statistical observation x and y. The result of such an assessment is the equation:, where, are the estimates of the parameters a and b, is the value of the effective attribute (variable) obtained by the regression equation (calculated value).

The most often used to estimate parameters least squares method (OLS).
The least squares method gives the best (consistent, efficient and unbiased) estimates of the parameters of the regression equation. But only if certain prerequisites are met regarding the random term (u) and the independent variable (x) (see OLS prerequisites).

The problem of estimating the parameters of a linear paired equation by the least squares method consists in the following: to obtain such parameter estimates, at which the sum of the squares of the deviations of the actual values ​​of the effective indicator - y i from the calculated values ​​- is minimal.
Formally OLS criterion can be written like this: .

Least squares classification

  1. Least square method.
  2. Maximum likelihood method (for the normal classical linear regression model, the normality of the regression residuals is postulated).
  3. The generalized least squares OLS method is used in the case of autocorrelation of errors and in the case of heteroscedasticity.
  4. Weighted least squares method (a special case of OLS with heteroscedastic residuals).

Let's illustrate the essence classical method least squares graphically... To do this, we will build a dot plot according to the observation data (x i, y i, i = 1; n) in a rectangular coordinate system (such a dot plot is called the correlation field). Let's try to find a straight line that is closest to the points of the correlation field. According to the method of least squares, the line is chosen so that the sum of the squares of the vertical distances between the points of the correlation field and this line would be minimal.

Mathematical record of this problem: .
We know the values ​​of y i and x i = 1 ... n, these are observational data. In the S function, they are constants. The variables in this function are the required parameter estimates -,. To find the minimum of a function of 2 variables, it is necessary to calculate the partial derivatives of this function with respect to each of the parameters and equate them to zero, i.e. .
As a result, we get a system of 2 normal linear equations:
Solving this system, we find the required parameter estimates:

The correctness of the calculation of the parameters of the regression equation can be checked by comparing the sums (there may be some discrepancy due to rounding of calculations).
To calculate the parameter estimates, you can build table 1.
The sign of the regression coefficient b indicates the direction of the relationship (if b> 0, the relationship is direct, if b<0, то связь обратная). Величина b показывает на сколько единиц изменится в среднем признак-результат -y при изменении признака-фактора - х на 1 единицу своего измерения.
Formally, the value of parameter a is the average value of y at x equal to zero. If the attribute factor does not and cannot have a zero value, then the above interpretation of the parameter a does not make sense.

Assessment of the tightness of the relationship between the signs is carried out using the coefficient of linear pair correlation - r x, y. It can be calculated using the formula: ... In addition, the linear pairwise correlation coefficient can be determined through the regression coefficient b: .
The range of admissible values ​​of the linear pair correlation coefficient is from –1 to +1. The sign of the correlation coefficient indicates the direction of the link. If r x, y> 0, then the connection is direct; if r x, y<0, то связь обратная.
If this coefficient is close to one in absolute value, then the relationship between the features can be interpreted as a rather close linear one. If its modulus is equal to one ê r x, y ê = 1, then the connection between the features is functional linear. If features x and y are linearly independent, then r x, y is close to 0.
To calculate r x, y, you can also use table 1.

Table 1

N observationx iy ix i ∙ y i
1 x 1y 1x 1 y 1
2 x 2y 2x 2 y 2
...
nx ny nx n y n
Column sum∑x∑y∑x y
Mean
To assess the quality of the obtained regression equation, the theoretical coefficient of determination is calculated - R 2 yx:

,
where d 2 is the variance y explained by the regression equation;
e 2 - residual (not explained by the regression equation) variance y;
s 2 y is the total (total) variance of y.
The coefficient of determination characterizes the proportion of the variation (variance) of the effective trait y, explained by the regression (and, consequently, the factor x), in the total variation (variance) y. The coefficient of determination R 2 yx takes values ​​from 0 to 1. Accordingly, the value 1-R 2 yx characterizes the proportion of variance y caused by the influence of other factors not taken into account in the model and specification errors.
With paired linear regression R 2 yx = r 2 yx.

Example.

Experimental data on the values ​​of variables NS and at are given in the table.

As a result of their alignment, the function is obtained

Using least square method, approximate this data with a linear dependence y = ax + b(find parameters a and b). Find out which of the two lines is better (in the sense of the least squares method) equalizes the experimental data. Make a drawing.

The essence of the method of least squares (OLS).

The task is to find the coefficients of the linear dependence for which the function of two variables a and b takes the smallest value. That is, given a and b the sum of the squares of the deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

Thus, the solution of the example is reduced to finding the extremum of a function of two variables.

Derivation of formulas for finding coefficients.

A system of two equations with two unknowns is composed and solved. Find the partial derivatives of a function with respect to variables a and b, we equate these derivatives to zero.

We solve the resulting system of equations by any method (for example substitution method or) and we obtain formulas for finding the coefficients by the method of least squares (OLS).

With data a and b function takes the smallest value. The proof of this fact is given.

That's the whole least squares method. Formula for finding the parameter a contains the sums,,, and the parameter n- the amount of experimental data. We recommend calculating the values ​​of these amounts separately. Coefficient b is after calculation a.

It's time to remember the original example.

Solution.

In our example n = 5... We fill in the table for the convenience of calculating the amounts that are included in the formulas of the desired coefficients.

The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

The values ​​in the fifth row of the table are obtained by squaring the values ​​of the 2nd row for each number i.

The values ​​in the last column of the table are the row sums of the values.

We use the formulas of the least squares method to find the coefficients a and b... We substitute in them the corresponding values ​​from the last column of the table:

Hence, y = 0.165x + 2.184- the required approximating straight line.

It remains to find out which of the lines y = 0.165x + 2.184 or better approximates the original data, that is, make an estimate using the least squares method.

Estimation of the error of the least squares method.

To do this, you need to calculate the sum of the squares of the deviations of the initial data from these lines and , the lower value corresponds to the line that better approximates the original data in the sense of the least squares method.

Since, then straight y = 0.165x + 2.184 approximates the original data better.

Graphical illustration of the method of least squares (mns).

Everything is perfectly visible on the graphs. The red line is the straight line found y = 0.165x + 2.184, the blue line is , pink dots are raw data.

What is it for, what are all these approximations for?

I personally use for solving problems of data smoothing, interpolation and extrapolation problems (in the original example, you might have asked to find the value of the observed value y at x = 3 or at x = 6 by the OLS method). But we'll talk about this in more detail later in another section of the site.

Proof.

So that when found a and b the function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second-order differential for the function was positively definite. Let's show it.

Ordinary Least Squares (OLS)- a mathematical method used to solve various problems, based on minimizing the sum of the squares of the deviations of some functions from the desired variables. It can be used to "solve" overdetermined systems of equations (when the number of equations exceeds the number of unknowns), to find a solution in the case of ordinary (not overdetermined) nonlinear systems of equations, to approximate the point values ​​of some function. OLS is one of the basic regression analysis methods for estimating unknown parameters of regression models based on sample data.

Collegiate YouTube

    1 / 5

    ✪ Least squares method. Theme

    ✪ Mitin IV - Processing of the results of physical. Experiment - Least Squares Method (Lecture 4)

    ✪ Least squares lesson 1/2. Linear function

    ✪ Econometrics. Lecture 5 Least squares method

    ✪ Least squares method. Answers

    Subtitles

History

Until the beginning of the 19th century. scientists did not have definite rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, particular methods were used that depended on the type of equations and on the wit of calculators, and therefore different calculators, based on the same observational data, came to different conclusions. Gauss (1795) was the author of the first application of the method, and Legendre (1805) independently discovered and published it under the modern name (fr. Méthode des moindres quarrés). Laplace linked the method with the theory of probability, and the American mathematician Edrain (1808) considered its theoretical and probabilistic applications. The method was spread and improved by further research by Encke, Bessel, Hansen and others.

The essence of the least squares method

Let be x (\ displaystyle x)- kit n (\ displaystyle n) unknown variables (parameters), f i (x) (\ displaystyle f_ (i) (x)), , m> n (\ displaystyle m> n)- a set of functions from this set of variables. The task is to select such values x (\ displaystyle x) so that the values ​​of these functions are as close as possible to some values y i (\ displaystyle y_ (i))... Essentially it comes on the "solution" of the overdetermined system of equations f i (x) = y i (\ displaystyle f_ (i) (x) = y_ (i)), i = 1,…, m (\ displaystyle i = 1, \ ldots, m) in the indicated sense of the maximum proximity of the left and right side systems. The essence of the LSM is to choose the sum of the squares of the deviations of the left and right sides as a "measure of proximity" | f i (x) - y i | (\ displaystyle | f_ (i) (x) -y_ (i) |)... Thus, the essence of OLS can be expressed as follows:

∑ iei 2 = ∑ i (yi - fi (x)) 2 → min x (\ displaystyle \ sum _ (i) e_ (i) ^ (2) = \ sum _ (i) (y_ (i) -f_ ( i) (x)) ^ (2) \ rightarrow \ min _ (x)).

If the system of equations has a solution, then the minimum of the sum of squares will be is zero and the exact solutions of the system of equations can be found analytically or, for example, by various numerical optimization methods. If the system is overdetermined, that is, speaking loosely, the number of independent equations more quantity of the sought variables, then the system does not have an exact solution and the least squares method allows one to find some "optimal" vector x (\ displaystyle x) in the sense of maximum proximity of vectors y (\ displaystyle y) and f (x) (\ displaystyle f (x)) or the maximum proximity of the vector of deviations e (\ displaystyle e) to zero (proximity is understood in the sense of Euclidean distance).

Example - a system of linear equations

In particular, the least squares method can be used to "solve" a system of linear equations

A x = b (\ displaystyle Ax = b),

where A (\ displaystyle A) rectangular size matrix m × n, m> n (\ displaystyle m \ times n, m> n)(that is, the number of rows of the matrix A is more than the number of sought variables).

In the general case, such a system of equations has no solution. Therefore, this system can be "solved" only in the sense of choosing such a vector x (\ displaystyle x) to minimize the "distance" between vectors A x (\ displaystyle Ax) and b (\ displaystyle b)... To do this, you can apply the criterion for minimizing the sum of squares of the differences between the left and right sides of the equations of the system, that is, (A x - b) T (A x - b) → min (\ displaystyle (Ax-b) ^ (T) (Ax-b) \ rightarrow \ min)... It is easy to show that the solution of this minimization problem leads to the solution of the following system of equations

ATA x = AT b ⇒ x = (ATA) - 1 AT b (\ displaystyle A ^ (T) Ax = A ^ (T) b \ Rightarrow x = (A ^ (T) A) ^ (- 1) A ^ (T) b).

OLS in regression analysis (data fit)

Let there be n (\ displaystyle n) values ​​of some variable y (\ displaystyle y)(these can be the results of observations, experiments, etc.) and the corresponding variables x (\ displaystyle x)... The challenge is to ensure that the relationship between y (\ displaystyle y) and x (\ displaystyle x) approximate by some function known up to some unknown parameters b (\ displaystyle b), that is, in fact, find the best values ​​of the parameters b (\ displaystyle b), maximally approximating values f (x, b) (\ displaystyle f (x, b)) to actual values y (\ displaystyle y)... In fact, this reduces to the case of a "solution" of an overdetermined system of equations with respect to b (\ displaystyle b):

F (x t, b) = y t, t = 1,…, n (\ displaystyle f (x_ (t), b) = y_ (t), t = 1, \ ldots, n).

In regression analysis, and in econometrics in particular, probabilistic models of the relationship between variables are used

Y t = f (x t, b) + ε t (\ displaystyle y_ (t) = f (x_ (t), b) + \ varepsilon _ (t)),

where ε t (\ displaystyle \ varepsilon _ (t))- so called random errors models.

Accordingly, the deviations of the observed values y (\ displaystyle y) from model f (x, b) (\ displaystyle f (x, b)) is assumed already in the model itself. The essence of OLS (ordinary, classical) is to find such parameters b (\ displaystyle b) for which the sum of squares of deviations (errors, for regression models they are often called regression residuals) e t (\ displaystyle e_ (t)) will be minimal:

b ^ O L S = arg ⁡ min b R S S (b) (\ displaystyle (\ hat (b)) _ (OLS) = \ arg \ min _ (b) RSS (b)),

where R S S (\ displaystyle RSS)- English. Residual Sum of Squares is defined as:

RSS (b) = e T e = ∑ t = 1 net 2 = ∑ t = 1 n (yt - f (xt, b)) 2 (\ displaystyle RSS (b) = e ^ (T) e = \ sum _ (t = 1) ^ (n) e_ (t) ^ (2) = \ sum _ (t = 1) ^ (n) (y_ (t) -f (x_ (t), b)) ^ (2) ).

In the general case, this problem can be solved by numerical optimization (minimization) methods. In this case, they talk about nonlinear least squares(NLS or NLLS - English Non-Linear Least Squares). In many cases, an analytical solution can be obtained. To solve the minimization problem, it is necessary to find the stationary points of the function R S S (b) (\ displaystyle RSS (b)), differentiating it by unknown parameters b (\ displaystyle b), equating the derivatives to zero and solving the resulting system of equations:

∑ t = 1 n (yt - f (xt, b)) ∂ f (xt, b) ∂ b = 0 (\ displaystyle \ sum _ (t = 1) ^ (n) (y_ (t) -f (x_ (t), b)) (\ frac (\ partial f (x_ (t), b)) (\ partial b)) = 0).

OLS for Linear Regression

Let the regression dependence be linear:

yt = ∑ j = 1 kbjxtj + ε = xt T b + ε t (\ displaystyle y_ (t) = \ sum _ (j = 1) ^ (k) b_ (j) x_ (tj) + \ varepsilon = x_ ( t) ^ (T) b + \ varepsilon _ (t)).

Let be y is the column vector of observations of the variable being explained, and X (\ displaystyle X)- this is (n × k) (\ displaystyle ((n \ times k)))-matrix of observations of factors (rows of the matrix are vectors of values ​​of factors in a given observation, by columns - a vector of values ​​of a given factor in all observations). The matrix representation of the linear model is:

y = X b + ε (\ displaystyle y = Xb + \ varepsilon).

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal

y ^ = X b, e = y - y ^ = y - X b (\ displaystyle (\ hat (y)) = Xb, \ quad e = y - (\ hat (y)) = y-Xb).

accordingly, the sum of the squares of the regression residuals will be

R S S = e T e = (y - X b) T (y - X b) (\ displaystyle RSS = e ^ (T) e = (y-Xb) ^ (T) (y-Xb)).

Differentiating this function with respect to the parameter vector b (\ displaystyle b) and equating the derivatives to zero, we obtain a system of equations (in matrix form):

(X T X) b = X T y (\ displaystyle (X ^ (T) X) b = X ^ (T) y).

In deciphered matrix form, this system of equations looks like this:

(∑ xt 1 2 ∑ xt 1 xt 2 ∑ xt 1 xt 3… ∑ xt 1 xtk ∑ xt 2 xt 1 ∑ xt 2 2 ∑ xt 2 xt 3… ∑ xt 2 xtk ∑ xt 3 xt 1 ∑ xt 3 xt 2 ∑ xt 3 2… ∑ xt 3 xtk ⋮ ⋮ ⋮ ⋱ ⋮ ∑ xtkxt 1 ∑ xtkxt 2 ∑ xtkxt 3… ∑ xtk 2) (b 1 b 2 b 3 ⋮ bk) = (∑ xt 1 yt ∑ xt 2 yt ∑ xt 3 yt ⋮ ∑ xtkyt), (\ displaystyle (\ begin (pmatrix) \ sum x_ (t1) ^ (2) & \ sum x_ (t1) x_ (t2) & \ sum x_ (t1) x_ (t3) & \ ldots & \ sum x_ (t1) x_ (tk) \\\ sum x_ (t2) x_ (t1) & \ sum x_ (t2) ^ (2) & \ sum x_ (t2) x_ (t3) & \ ldots & \ sum x_ (t2) x_ (tk) \\\ sum x_ (t3) x_ (t1) & \ sum x_ (t3) x_ (t2) & \ sum x_ (t3) ^ (2) & \ ldots & \ sum x_ (t3) x_ (tk) \\\ vdots & \ vdots & \ vdots & \ ddots & \ vdots \\\ sum x_ (tk) x_ (t1) & \ sum x_ (tk) x_ (t2) & \ sum x_ (tk) x_ (t3) & \ ldots & \ sum x_ (tk) ^ (2) \\\ end (pmatrix)) (\ begin (pmatrix) b_ (1) \\ b_ (2) \\ b_ (3 ) \\\ vdots \\ b_ (k) \\\ end (pmatrix)) = (\ begin (pmatrix) \ sum x_ (t1) y_ (t) \\\ sum x_ (t2) y_ (t) \\ \ sum x_ (t3) y_ (t) \\\ vdots \\\ sum x_ (tk) y_ (t) \\\ end (pmatrix)),) where all the sums are taken over all admissible values t (\ displaystyle t).

If a constant is included in the model (as usual), then x t 1 = 1 (\ displaystyle x_ (t1) = 1) with all t (\ displaystyle t), therefore, in the upper left corner of the matrix of the system of equations, there is the number of observations n (\ displaystyle n), and in the rest of the elements of the first row and the first column - just the sum of the values ​​of the variables: ∑ x t j (\ displaystyle \ sum x_ (tj)) and the first element of the right side of the system is ∑ y t (\ displaystyle \ sum y_ (t)).

The solution of this system of equations gives the general formula of the OLS estimates for the linear model:

b ^ OLS = (XTX) - 1 XT y = (1 n XTX) - 1 1 n XT y = V x - 1 C xy (\ displaystyle (\ hat (b)) _ (OLS) = (X ^ (T ) X) ^ (- 1) X ^ (T) y = \ left ((\ frac (1) (n)) X ^ (T) X \ right) ^ (- 1) (\ frac (1) (n )) X ^ (T) y = V_ (x) ^ (- 1) C_ (xy)).

For analytical purposes, the last representation of this formula turns out to be useful (in the system of equations when divided by n, instead of sums, arithmetic means appear). If in the regression model the data centered, then in this representation the first matrix has the meaning of the sample covariance matrix of factors, and the second is the vector of covariance of factors with the dependent variable. If, in addition, the data is also normalized to SKO (that is, ultimately standardized), then the first matrix has the meaning of a selective correlation matrix of factors, the second vector is a vector of selective correlations of factors with a dependent variable.

An important property of OLS estimates for models with constant- the line of the constructed regression passes through the center of gravity of the sample data, that is, the equality is fulfilled:

y ¯ = b 1 ^ + ∑ j = 2 kb ^ jx ¯ j (\ displaystyle (\ bar (y)) = (\ hat (b_ (1))) + \ sum _ (j = 2) ^ (k) (\ hat (b)) _ (j) (\ bar (x)) _ (j)).

In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of the only parameter (the constant itself) is equal to the mean value of the variable being explained. That is, the arithmetic mean, known for its good properties from the laws of large numbers, it is also an OLS-estimate - it satisfies the criterion of the minimum sum of squares of deviations from it.

The simplest special cases

In the case of paired linear regression y t = a + b x t + ε t (\ displaystyle y_ (t) = a + bx_ (t) + \ varepsilon _ (t)), when the linear dependence of one variable on another is estimated, the calculation formulas are simplified (you can do without matrix algebra). The system of equations is as follows:

(1 x ¯ x ¯ x 2 ¯) (ab) = (y ¯ xy ¯) (\ displaystyle (\ begin (pmatrix) 1 & (\ bar (x)) \\ (\ bar (x)) & (\ bar (x ^ (2))) \\\ end (pmatrix)) (\ begin (pmatrix) a \\ b \\\ end (pmatrix)) = (\ begin (pmatrix) (\ bar (y)) \\ (\ overline (xy)) \\\ end (pmatrix))).

Hence, it is easy to find estimates of the coefficients:

(b ^ = Cov ⁡ (x, y) Var ⁡ (x) = xy ¯ - x ¯ y ¯ x 2 ¯ - x ¯ 2, a ^ = y ¯ - bx ¯. (\ displaystyle (\ begin (cases) (\ hat (b)) = (\ frac (\ mathop (\ textrm (Cov)) (x, y)) (\ mathop (\ textrm (Var)) (x))) = (\ frac ((\ overline (xy)) - (\ bar (x)) (\ bar (y))) ((\ overline (x ^ (2))) - (\ overline (x)) ^ (2))), \\ ( \ hat (a)) = (\ bar (y)) - b (\ bar (x)). \ end (cases)))

Despite the fact that in the general case the model with a constant is preferable, in some cases it is known from theoretical considerations that the constant a (\ displaystyle a) should be zero. For example, in physics, the relationship between voltage and current has the form U = I ⋅ R (\ displaystyle U = I \ cdot R); measuring the voltage and current strength, it is necessary to estimate the resistance. In this case, we are talking about the model y = b x (\ displaystyle y = bx)... In this case, instead of the system of equations, we have the only equation

(∑ x t 2) b = ∑ x t y t (\ displaystyle \ left (\ sum x_ (t) ^ (2) \ right) b = \ sum x_ (t) y_ (t)).

Consequently, the formula for estimating a single coefficient has the form

B ^ = ∑ t = 1 nxtyt ∑ t = 1 nxt 2 = xy ¯ x 2 ¯ (\ displaystyle (\ hat (b)) = (\ frac (\ sum _ (t = 1) ^ (n) x_ (t ) y_ (t)) (\ sum _ (t = 1) ^ (n) x_ (t) ^ (2))) = (\ frac (\ overline (xy)) (\ overline (x ^ (2)) ))).

Polynomial model case

If the data is fitted with a single variable polynomial regression function f (x) = b 0 + ∑ i = 1 k b i x i (\ displaystyle f (x) = b_ (0) + \ sum \ limits _ (i = 1) ^ (k) b_ (i) x ^ (i)), then, perceiving the degree x i (\ displaystyle x ^ (i)) as independent factors for everyone i (\ displaystyle i) it is possible to estimate the parameters of the model based on the general formula for estimating the parameters of a linear model. To do this, it is sufficient to take into account in the general formula that with such an interpretation x t i x t j = x t i x t j = x t i + j (\ displaystyle x_ (ti) x_ (tj) = x_ (t) ^ (i) x_ (t) ^ (j) = x_ (t) ^ (i + j)) and x t j y t = x t j y t (\ displaystyle x_ (tj) y_ (t) = x_ (t) ^ (j) y_ (t))... Hence, matrix equations in this case will take the form:

(n ∑ nxt… ∑ nxtk ∑ nxt ∑ nxi 2… ∑ mxik + 1 ⋮ ⋮ ⋱ ⋮ ∑ nxtk ∑ nxtk + 1… ∑ nxt 2 k) [b 0 b 1 ⋮ bk] = [∑ nyt ∑ nxtyt ⋮ ∑ nxtkyt ]. (\ displaystyle (\ begin (pmatrix) n & \ sum \ limits _ (n) x_ (t) & \ ldots & \ sum \ limits _ (n) x_ (t) ^ (k) \\\ sum \ limits _ ( n) x_ (t) & \ sum \ limits _ (n) x_ (i) ^ (2) & \ ldots & \ sum \ limits _ (m) x_ (i) ^ (k + 1) \\\ vdots & \ vdots & \ ddots & \ vdots \\\ sum \ limits _ (n) x_ (t) ^ (k) & \ sum \ limits _ (n) x_ (t) ^ (k + 1) & \ ldots & \ sum \ limits _ (n) x_ (t) ^ (2k) \ end (pmatrix)) (\ begin (bmatrix) b_ (0) \\ b_ (1) \\\ vdots \\ b_ (k) \ end ( bmatrix)) = (\ begin (bmatrix) \ sum \ limits _ (n) y_ (t) \\\ sum \ limits _ (n) x_ (t) y_ (t) \\\ vdots \\\ sum \ limits _ (n) x_ (t) ^ (k) y_ (t) \ end (bmatrix)).)

Statistical properties of OLS estimates

First of all, we note that for linear models the OLS estimates are linear estimates as follows from the above formula. For the unbiasedness of the OLS estimates, it is necessary and sufficient that essential condition regression analysis: the mathematical expectation of a random error, conditional on factors, should be equal to zero. This condition, in particular, is satisfied if

  1. the mathematical expectation of random errors is zero, and
  2. factors and random errors are independent random variables.

The second condition - the condition of exogenous factors - is fundamental. If this property is not met, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even very large volume data does not allow obtaining qualitative estimates in this case). In the classical case, a stronger assumption is made about the determinism of factors, as opposed to a random error, which automatically means the fulfillment of the exogenous condition. In the general case, for the consistency of the estimates, it is sufficient to satisfy the exogeneity condition together with the convergence of the matrix V x (\ displaystyle V_ (x)) to some non-degenerate matrix with increasing sample size to infinity.

In order for, in addition to consistency and unbiasedness, estimates of (ordinary) least squares to be effective (the best in the class of linear unbiased estimates), it is necessary to fulfill additional properties of a random error:

These assumptions can be formulated for the covariance matrix of the vector of random errors V (ε) = σ 2 I (\ displaystyle V (\ varepsilon) = \ sigma ^ (2) I).

A linear model satisfying these conditions is called classical... OLS estimates for classical linear regression are unbiased, consistent and the most effective estimates in the class of all linear unbiased estimates (in English literature, the abbreviation is sometimes used BLUE (Best Linear Unbiased Estimator) is the best linear unbiased estimate; in the domestic literature, the Gauss - Markov theorem is more often cited). As it is easy to show, the covariance matrix of the vector of coefficient estimates will be equal to:

V (b ^ OLS) = σ 2 (XTX) - 1 (\ displaystyle V ((\ hat (b)) _ (OLS)) = \ sigma ^ (2) (X ^ (T) X) ^ (- 1 )).

Efficiency means that this covariance matrix is ​​"minimal" (any linear combination of coefficients, and in particular the coefficients themselves, have the minimum variance), that is, in the class of linear unbiased estimates, the OLS estimates are the best. The diagonal elements of this matrix are the variances of the coefficient estimates - important parameters the quality of the estimates obtained. However, it is impossible to calculate the covariance matrix, since the variance of the random errors is unknown. It can be proved that the unbiased and consistent (for the classical linear model) estimate of the variance of random errors is the value:

S 2 = R S S / (n - k) (\ displaystyle s ^ (2) = RSS / (n-k)).

Substituting this value in the formula for the covariance matrix and we obtain an estimate of the covariance matrix. The estimates obtained are also unbiased and consistent. It is also important that the estimate of the variance of errors (and hence the variances of the coefficients) and the estimates of the model parameters are independent. random variables, which allows you to get test statistics for testing hypotheses about the coefficients of the model.

It should be noted that if the classical assumptions are not met, the OLS estimates of the parameters are not the most efficient and, where W (\ displaystyle W)- some symmetric positive definite weight matrix. Ordinary OLS is a special case of this approach, when the weight matrix is ​​proportional to identity matrix... As is known, for symmetric matrices (or operators) there is a decomposition W = P T P (\ displaystyle W = P ^ (T) P)... Therefore, this functional can be represented as follows e TPTP e = (P e) TP e = e ∗ T e ∗ (\ displaystyle e ^ (T) P ^ (T) Pe = (Pe) ^ (T) Pe = e _ (*) ​​^ (T) e_ ( *)), that is, this functional can be represented as the sum of the squares of some transformed "residuals". Thus, we can distinguish a class of least squares methods - LS-methods (Least Squares).

It has been proved (Aitken's theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are estimates of the so-called generalized OLS (OLS, GLS - Generalized Least Squares)- LS-method with a weight matrix equal to the inverse covariance matrix of random errors: W = V ε - 1 (\ displaystyle W = V _ (\ varepsilon) ^ (- 1)).

It can be shown that the formula for OLS estimates for the parameters of a linear model has the form

B ^ GLS = (XTV - 1 X) - 1 XTV - 1 y (\ displaystyle (\ hat (b)) _ (GLS) = (X ^ (T) V ^ (- 1) X) ^ (- 1) X ^ (T) V ^ (- 1) y).

The covariance matrix of these estimates will accordingly be equal to

V (b ^ GLS) = (XTV - 1 X) - 1 (\ displaystyle V ((\ hat (b)) _ (GLS)) = (X ^ (T) V ^ (- 1) X) ^ (- 1)).

In fact, the essence of OLS is a certain (linear) transformation (P) of the original data and the application of the usual OLS to the transformed data. The goal of this transformation is that for the transformed data, random errors already satisfy the classical assumptions.

Weighted OLS

In the case of a diagonal weight matrix (and hence a covariance matrix of random errors), we have the so-called Weighted Least Squares (WLS). In this case, the weighted sum of the squares of the residuals of the model is minimized, that is, each observation receives a "weight" inversely proportional to the variance of the random error in this observation: e TW e = ∑ t = 1 net 2 σ t 2 (\ displaystyle e ^ (T) We = \ sum _ (t = 1) ^ (n) (\ frac (e_ (t) ^ (2)) (\ sigma _ (t) ^ (2))))... In fact, the data is transformed by weighting the observations (dividing by a value proportional to the estimated standard deviation of random errors), and regular OLS is applied to the weighted data.

ISBN 978-5-7749-0473-0.

  • Econometrics. Textbook / Ed. Eliseeva I.I. - 2nd ed. - M.: Finance and statistics, 2006 .-- 576 p. - ISBN 5-279-02786-3.
  • Alexandrova N.V. History of mathematical terms, concepts, designations: reference dictionary. - 3rd ed .. - M.: LKI, 2008 .-- 248 p. - ISBN 978-5-382-00839-4. I.V. Mitin, Rusakov V.S. Analysis and processing of experimental data - 5th edition - 24s.