Assessment of the parameters of the regression equation. Linear regression and correlation: meaning and evaluation of parameters


Fig. 2.1. Graph regression line

The first expression allows for the specified values \u200b\u200bof the factor x. Calculate theoretical values \u200b\u200bof the resulting feature, substituting the actual values \u200b\u200bof the factor x.. On the graph, theoretical values \u200b\u200blie on the straight line, which are a regression line (Fig. 2.1).

The construction of linear regression is reduced to the assessment of its parameters but and b.. A classic approach to evaluating linear regression parameters is based on method smallest squares (MNC).

MNA allows us to obtain such estimates of parameters A and B, at which the sum of the squares of the deviations of the actual values \u200b\u200bfrom the theoretical minimal:

To find a minimum it is necessary to calculate private derivatives (4) for each of the parameters - but and b. - and equate them to zero.

(5)

We transform, get system of normal equations:

(6)

In this system n -sampling volume, sums are easily calculated from the source data. We solve the system with respect to but and b.We get:

(7)

. (8)

Expression (7) can be written in another form:

(9)

where covariance features, dispersion of factor x.

Parameter b. called recession coefficient.Its value shows the average change in the result with a change in a factor per unit. The possibility of a clear economic interpretation of the regression coefficient did linear equation Steam regression is sufficiently common in econometric studies.

Formally a - value y. for x \u003d.0. If a x.does not have and cannot have a zero value, then such a interpretation of a free member a.it makes no sense. Parameter a. May not have economic content. Attempts to economically interpret it can lead to absurd, especially when a.< 0. Интерпретировать можно лишь знак при параметре a. If a a.\u003e 0, then the relative change in the result is slower than changing the factor. Compare these relative changes:

< при > 0, > 0 <

Sometimes a linear pair regression equation is recorded for deviations from medium values:

where,. At the same time a free member equal to zero.As reflected in the expression (10). This fact follows from geometrical considerations: the regression equation corresponds to the same straight line (3), but when assessing regression in deviations, the origin moves to a point with coordinates. In this case, in the expression (8), both amounts will be zero, which will entail the equality zero of a free member.

Consider as an example of a group of enterprises producing one type of product, regression dependence of costs from output .

Table 2.1.

Product production of thousands () Production costs, million rubles ()
31,1
67,9

Continuing table 2.1

141,6
104,7
178,4
104,7
141,6
TOTAL: 22. 770,0

The system of normal equations will look at:

Solving it, get a \u003d.-5,79, B \u003d36,84.

The regression equation has the form:

Substituting the value in equation h.we will find theoretical values y. (Last table column).

Value a. It does not have economic meaning. If variables x.and y.express through deviations from medium levels, the regression line on the schedule will be held through the origin of the coordinates. The assessment of the regression coefficient will not change:

where,.

With linear regression, the linear correlation coefficient is the linear correlation as an indicator r:

The value characterizes the fraction of the dispersion y.caused by the influence of other not recorded in the factors model.

2.3. MNGS (Gauss Markov conditions)

Connection between y. and x. In the paired regression is not a functional, but correlation. Therefore, estimates of the parameters a. and b. are random values \u200b\u200bwhose properties are significantly dependent on the properties of the random component ε. To obtain the best results on MN, it is necessary to perform the following prerequisites for the random deviation (Gauss-Markov conditions):

1. Expected value Random deviation is zero for all observations: .

2. Dispersion of random deviations is constant: .

The feasibility of this background is called homokdasticity -constancy dissection dispersions. Impracticability of this background is called heterosedasticity -inconstancy dissection dispersions.

3. Random deviations ε I. and ε j. are independent of each other for:

The feasibility of this condition is called lack of autocorrelation.

4. Random deviation must be independent of the explanatory variables. Usually, this condition is performed automatically, if explaining the variables in this model are not random. In addition, the feasibility of this prerequisite for econometric models is not so critical compared to the first three.

With the execution of these prerequisites theorem Gauss Markova: estimates (7) and (8), obtained by MNA, have the smallest dispersion in the class of all linear unbelievable estimates. .

Thus, when performing conditions Gauss - Markova Estimates (7) and (8) are not only unbelievable estimates of regression coefficients, but also most effective, i.e. they have a smallest dispersion compared to any other estimates of these parameters, linear relative to the values y I..

It is the understanding of the importance of Gauss conditions - Markova distinguishes a competent researcher using regression analysis, from incompetent. If these conditions are not fulfilled, the researcher must conscious. If corrective actions are possible, the analyst must be able to execute them. If it is impossible to correct the situation, the researcher must be able to assess how seriously it may affect the results.

2.4. Estimation of the materiality of the parameters of the linear
Regression and correlation

After the linear regression equation is found (3), the significance of the equation as a whole and the individual parameters is evaluated.

Assessment of the significance of the regression equation as a whole is given by F.-Criteria Fisher. At the same time, zero hypothesis is put forward that the regression coefficient is zero and, therefore, the factor h. does not affect the result y.

Before calculating the criterion, dispersion analysis is carried out. It can be shown that the total amount of squares of deviations (SKE) y. From the average value is decomposed into two parts - explained and inexplicable:


(Total speed) \u003d

Here are two extreme cases: when the total approach is exactly equal to the residual and when the common approach is equal to factor.

In the first case, factor h. does not affect the result, all dispersion y. due to the impact of other factors, the regression line is parallel to the axis Oh and.

In the second case, other factors do not affect the result, y.associated with x. Functionally, and the residual approach is zero.

But in practice, both components are present in the right side (13). The suitability of the regression line for the forecast depends on which part of the total variation y. It comes to explained by the variation. If the applied speed is more residual speed, the regression equation is statistically significant and factor h. has a significant impact on the result y.. This is equivalent to the fact that the determination coefficient will approach one.

The number of degrees of freedom.(dF-DEGREES OF FREEDOM) - This is the number of independently variable sign values.

For general assessment, independent deviations are required, since. which allows you to freely vary the values, and the last n.- The deviation is determined from the total amount equal to zero. therefore .

Factor speed can be expressed like this:

This speed depends only on one parameter b,since the expression under the amount of the amount to the values \u200b\u200bof the resulting feature does not apply. Consequently, the factor approach has one degree of freedom, and

To determine, we use the analogy with the balance equality (11). Also, as in equality (11), it is possible to record equality and between the numbers of the degrees of freedom:

So we can record . From this balance we determine that

Sharing each speed on its number of degrees of freedom, we get the average square of deviations,or dispersion by one degree of freedom:

. (15)

. (16)

. (17)

Comparing factor and residual dispersion per one degree of freedom, we get F.-criteria for checking the zero hypothesis, which in this case is recorded as

If you are valid, the dispersion does not differ from each other. For it is necessary to refute so that the factor dispersion exceeds the residual several times.

English Statistics Snedacor Developed Tables critical values F. for different levels Salted Snedcecor I. various numbers degrees of freedom. Table value F.-criteria is the maximum amount of dispersion ratio, which may occur when they are randomly discrepancies for this level of the likelihood of zero hypothesis.

When finding table value F.-criteria sets the level of significance (usually 0.05 or 0.01) and two degrees of freedom - the numerator (it is equal to one) and the denominator equal to

Calculated F. It is recognized as reliable (different from one), if it is more tabular, i.e. (α; 1;). In this case, the conclusion is rejected and concluded D Factover D remains., i.e. about materiality statistical connection between y.and x.

If a , the probability is higher than the specified level (for example: 0.05), and this hypothesis cannot be rejected without serious risk to make the wrong conclusion about the availability of the connection between y.and x. The regression equation is considered statistically insignificant, does not deviate.

Value F.-Criteria is associated with the determination coefficient.

, (19)

In linear regression, the significance of not only equations in general is usually estimated, but also its individual parameters.

The standard error of the regression coefficient is determined by the formula:

, (20)

Residual dispersion by one degree of freedom (the same as).

Standard error in conjunction with t-student's distribution in the degrees of freedom is used to verify the materiality of the regression coefficient and to calculate its confidence intervals.

The magnitude of the regression coefficient is compared with its standard error; The actual value is determined t-criteria Student

which is then compared with tabular value At a certain level of significance α and the number of degrees of freedom. Here the zero hypothesis is also checked in the form of also involving theism of the statistical connection between y. and h.but only taking into account b.rather than the ratio between factor and residual dispersions in general balance Dispersions of the productive feature. But the general meaning of the hypotheses is the same: checking the presence of a statistical connection between y. and h. or its absence.

If (α;), the hypothesis must be rejected, and statistical communication y. from h. Considered. In the case (α;), the zero hypothesis cannot be rejected, and the influence h. on the y. recognized as insignificant.

There is a connection between and F.:

Hence it follows that

Trust interval for b. defined as

where - the calculated (estimated) on the MNA value of the regression coefficient.

The standard error of the parameter is determined by the formula:

Establishment procedure a. no different from that parameter b.. In this case, the actual value t.-criteria is calculated by the formula:

Procedure for verifying significance linear coefficient Correlation differs from the procedures above. This is explained by r. as random value distributed according to normal law only when big number observations and small values \u200b\u200b| r.|. In this case, the hypothesis about the absence of correlation between y. and h. Checked on the basis of statistics

, (26)

which at justice is approximately distributed under the law of Student with () degrees of freedom. If a then the hypothesis is rejected with a probability of mistake not exceeding α . From (19) it can be seen that in the pair of linear regression. In addition, therefore. Thus, checking the hypotheses on the importance of regression and correlation coefficients is equivalent to checking the hypothesis about the materiality of the linear regression equation.

But with small samples and values r.close to, it should be borne in mind that the distribution r. How random variance differs from normal, and building confidence intervals for r. Cannot be performed in a standard way. In this case, it is generally easy to come to the contradiction, which consists in the fact that trust interval will contain values \u200b\u200bexceeding one.

To get around this difficulty, the so-called
z.Frustration Fisher:

, (27)

which gives normal distributed value z., the values \u200b\u200bof which when changing r. From -1 to +1 vary from -∞ to + ∞. The standard error of this magnitude is equal to:

. (28)

For magnitude z. There are tables in which its values \u200b\u200bare given for the corresponding values. r..

For z. Naul-hypothesis is put forward, which is not the correlation. In this case, statistics values

which is distributed under the law of Student with () degrees of freedom, does not exceed the table at the appropriate level of significance.

For each value z. can calculate critical values r.. Tables of critical values r. Designed for levels of significance 0.05 and 0.01 and the corresponding number of degrees of freedom. If the calculated value r. exceeds the absolute value of the table, then this value r. It is considered essential. Otherwise, the actual value is insignificant.

2.5. Nonlinear regression models
And their linearization

So far we have considered only linear Model regression addiction y. from x. (3). At the same time, many important connections in the economy are nonlinear. Examples of this kind of regression models are production functions (dependencies between the volume of products manufactured and the main production factors - labor, capital, etc.) and the function of demand (the relationship between demand for any type of goods or services, on the one hand, and income and prices for this and other goods - on the other).

When analyzing nonlinear regression dependencies, the most an important issue The use of classic MNA is the method of their linearization. In the case of linearization of nonlinear dependencies, we obtain a linear regression equation of type (3), the parameters of which are estimated by ordinary MNA, after which it is possible to write the original non-linear ratio.

Several mansion in this sense is the polynomial model of an arbitrary degree:

to which ordinary MNK can be used without any preliminary linearization.

Consider the specified procedure for parabole of the second degree:

. (31)

This dependence is suitable in the event that for a certain interval of the values \u200b\u200bof the factor, the increasing dependence is changing to a decreasing or vice versa. In this case, you can determine the value of the factor at which the maximum or minimum value of the result is achieved. If the initial data does not detect the change in the direction of communication, the parameters of the parabola are becoming difficult to be interpreted, and the communication form is better to replace with other nonlinear models.

The use of MNC to estimate the parameters of the parabolla of the second degree reduces to the differentiation of the sum of the squares of the regression residues for each of the estimated parameters and equating the obtained expressions zero. The system of normal equations is obtained, the number of which is equal to the number of estimated parameters, i.e. three:

(32)

You can solve this system in any way, in particular, by the method of determinants.

The extreme function value is observed when the factor value is equal:

If, then there is a maximum, that is, the dependence first grows, and then falls. This kind of dependence is observed in the labor economy when studying wages Workers of physical work when age acts as a factor. At Parabola has a minimum, which is usually manifested at the specific cost of production, depending on the volume of products.

In nonlinear dependencies that are unpaid by classical polynomials, preliminary linearization is necessarily carried out, which is converting or variables, or model parameters, or in combination of these transformations. Consider some classes of such dependencies.

The dependences of the hyperbolic type have the form:

. (33)

An example of such a dependence is phillips curve, noting the inverse dependence of the percentage of wage growth from unemployment. In this case, the value of the parameter b. There will be more zero.

Another dependence (33) are curves EngelForming the following pattern: with income income, the share of income savings consumable is reduced, and the share of income spent on non-food goods will increase. In this case, a learning sign in (33) shows the share of costs for non-food products.

Linearization of equation (33) is reduced to the replacement of the factor, and the regression equation has the form (3) in which instead of factor h. Using factor z.:

The same linear equation reduces a semi-luguriform curve:

, (35)

which can be used to describe Engel curves. Here ln ( x.) Replaced by z. and the equation is obtained (34).

A fairly wide class of economic indicators is characterized by an approximately constant rate of relative increase in time. This corresponds to the dependence of the indicative (exponential) type, which are recorded in the form:

or in the form

. (37)

This dependence is possible:

. (38)

In regressions of type (36) - (38), the same method of linearization is used - logarithmation. Equation (36) is given to mind:

. (39)

Replacing the variable reduces it to a linear view:

, (40)

where. If a E. Satisfies Gauss Markov's conditions, the parameters of equation (36) are estimated on MNA from equation (40). Equation (37) is given to mind:

which differs from (39) only by the type of free member, and the linear equation looks like this:

, (42)

where. Parameters BUT and b. obtain ordinary mins, then the parameter a. Depending (37), it turns out as an antilogarif BUT. With logarithming (38) we get a linear dependence:

, (43)

where, and the remaining designations are the same as above. It also uses MNC to transformed data, and the parameter b. For (38), it turns out as an antilogarithm of the coefficient IN.

Widespread in the practice of socio-economic research by power dependencies. They are used to build and analyze production functions. In the functions of the form:

especially valuable is the fact that the parameter b. equal to the elasticity coefficient of the effective sign on the factor h.. Converting (44) by logarithming, we obtain linear regression:

, (45)

Another type of nonlinearity, which causes a linear appearance is inverse addiction:

. (46)

Conducting a replacement, we get.

In the territories of the region are given data for 200x.

Room of region Fisher-sized subsistence minimum per day of one able-bodied, rub., X The average daily salary, rub.,
1 78 133
2 82 148
3 87 134
4 79 154
5 89 162
6 106 195
7 67 139
8 88 158
9 73 152
10 87 162
11 76 159
12 115 173

The task:

1. Build the correlation field and formulate a hypothesis about the form of communication.

2. Calculate the parameters of the linear regression equation

4. Give it with the help of an average (general) elasticity coefficient, a comparative estimate of the factor communication force with the result.

7. Calculate the projected value of the result if the factor value of the factor will increase by 10% of its average level. Determine the trust interval of the forecast for the level of significance.

Decision:

We will solve this task using Excel.

1. Comparing the available data x and y, for example, ranking them in ascending order of factor x, one can observe the presence of direct relationship between the signs, when an increase in the average daily subsistence minimum increases the average daily salary. Based on this, it can be assumed that the relationship between signs is straight and can be described by the direct equation. The same conclusion is confirmed on the basis of graphical analysis.

To build a correlation field, you can use Excel PPP. Enter the source data in the sequence: first x, then y.

Select the area of \u200b\u200bthe cells containing the data.

Then choose: Insert / point chart / spotted with markers As shown in Figure 1.

Figure 1 Construction field

An analysis of the correlation field shows the presence of a rectilinear dependence, since points are located almost in a straight line.

2. To calculate the parameters of the linear regression equation
we use the built-in statistical function Linene.

For this:

1) Open an existing file containing the analyzed data;
2) Select the area of \u200b\u200bempty cells 5 × 2 (5 lines, 2 columns) to output results regression statistics.
3) activate Master of Functions: In the main menu choose Formulas / insert function.
4) in the window Category you are taking Statisticalin the window function - Linene. Click on the button OK As shown in Figure 2;

Figure 2 Dialog box "Master of Functions"

5) Fill out the function arguments:

Famous values \u200b\u200bof U.

Famous values \u200b\u200bof H.

Constant - a logical value indicated by the presence or on the absence of a free member in the equation; If the constant \u003d 1, then the free term is calculated in the usual way, if the constant \u003d 0, then the free member is 0;

Statistics - A logical value that indicates, to output additional information on regression analysis or not. If statistics \u003d 1, then additional information is output if statistics \u003d 0, then only estimates of the parameters of the equation are displayed.

Click on the button OK;

Figure 3 Dialog box Arguments Linene function

6) The first element of the final table will appear in the left-left cell of the selected area. To reveal the entire table, press the key and then on the key combination ++ .

Additional regression statistics will be displayed in the order indicated in the following scheme:

B. Coefficient value The value of the coefficient A.
Standard Error B. Standard error A.
Standard error Y.
F-statistics
Regression Squares

Figure 4 The result of calculating the function Linene

Received regression levels:

We conclude: with an increase in the average permanent subsistence minimum for 1 rub. The average daily salary increases on average by 0.92 rubles.

It means that 52% of wage variations (y) is explained by the variation of the Factor X - the average permanent subsistence minimum, and 48% are the action of other factors not included in the model.

According to the calculated determination coefficient, you can calculate the correlation coefficient: .

Communication is estimated as close.

4. With the help of an average (general) coefficient of elasticity, we define the effect of the influence of the factor on the result.

For the equation, the direct medium (general) elasticity coefficient is determined by the formula:

The average values \u200b\u200bwill be found by selecting the area of \u200b\u200bthe cells with the values \u200b\u200bx, and choose Formulas / Avosumn / MiddleAnd the same thing will make with the values \u200b\u200bof y.

Figure 5 Calculation of medium values \u200b\u200bof the function and argument

Thus, when changing the average per capita subsistence minimum, 1% of its average value, the average daily wages will change by an average of 0.51%.

Using data analysis tool Regression you can get it:
- results of regression statistics,
- results of dispersion analysis,
- results of confidence intervals,
- residues and charts of the regression line,
- residues and normal probability.

Procedure Next:

1) Check access to Package Analysis. In the main menu sequentially select: File / Parameters / Add-in.

2) in the drop-down list Control Select Excel add-in and click Go

3) in the window Superstructure Check the box Analysis packageand then click OK.

If a Analysis package missing in the field list Affordable superstructuresClick the button OverviewTo execute the search.

If a message is displayed that the analysis package is not installed on the computer, click Yesto install it.

4) In the main menu, select: Data / Data Analysis / Analysis Tools / Regressionand then click OK.

5) Fill out the data entry dialog box and output parameters:

Input interval Y. - range containing performance data;

Input interval X. - range containing a factor attribute data;

Tags - checkbox that indicates whether the first line of the name of columns or not;

Constant - Zero - checkbox indicating the presence or absence of a free member in the equation;

Output interval - It is enough to indicate the left upper cell of the future range;

6) New working sheet - you can set an arbitrary name of a new sheet.

Then press the button OK.

Figure 6 Dialog box input tool parameters Regression

The results of the regression analysis for these tasks are shown in Figure 7.

Figure 7 Result Application Regression tool

5. We will be assessed using average error Approximation Quality of equations. We use the results of the regression analysis presented in Figure 8.

Figure 8 The result of applying the regression tool "Conclusion of the residue"

We will make a new table as shown in Figure 9. In the graph with calculate the relative error of approximation by the formula:

Figure 9 Calculation of the average approximation error

The average error of approximation is calculated by the formula:

The quality of the constructed model is estimated as good because it does not exceed 8 - 10%.

6. From the table with regression statistics (Figure 4), we write out the actual value of the Fisher's F-criterion:

Insofar as with a 5% level of significance, it can be concluded that the regression equation can be concluded (communication is proven).

8. Evaluation statistical significance Regression parameters with the help of T-statistics of Student and by calculating the confidence interval of each of the indicators.

I put forward the hypothesis H 0 on the statistically insignificant difference of indicators from zero:

.

for the number of degrees of freedom

Figure 7 has the actual T-statistics values:

t-criterion for the correlation coefficient can be calculated in two ways:

I method:

where - Random error of the correlation coefficient.

We take data to calculate the table in Figure 7.

The II way:

The actual T-statistics values \u200b\u200bexceed table values:

Therefore, the hypothesis of H 0 deviates, that is, the regression parameters and the correlation coefficient are not accidentally different from zero, and statistically significant.

The confidence interval for the parameter A is defined as

For parameter A 95% boundaries, as shown in Figure 7, were:

The confidence interval for the regression coefficient is defined as

For the regression coefficient B 95% boundaries as shown in Figure 7 amounted to:

Analysis of the upper and lower boundaries of confidence intervals leads to the conclusion that with probability Parameters a and b, being in the specified boundaries, do not take zero values, i.e. are not statistically insignificant and significantly different from zero.

7. The estimates of the regression equation make it possible to use it for the forecast. If the forecast value of the subsistence minimum will be:

Then the expense value of the subsistence minimum will be:

The forecast error will calculate the formula:

where

The dispersion is also considered using Excel PPP. For this:

1) activate Master of Functions: In the main menu choose Formulas / insert function.

3) Fill the range containing the numeric data of the factor. Click OK.

Figure 10 Dispersion calculation

Received the difference between the dispersion

To calculate the residual dispersion by one degree of freedom, we will use the results of the dispersion analysis as shown in Figure 7.

Trust intervals of the forecast of individual values \u200b\u200bin with the probability of 0.95 are determined by the expression:

The interval is quite wide, primarily due to the small volume of observations. In general, the forecast of the average monthly salary was reliable.

The task condition is taken from: workshop on econometrics: studies. Manual / I.I. Eliseeva, S.V. Kuryscheva, N.M. Gordenko et al.; Ed. I.I. Eliseeva. - M.: Finance and Statistics, 2003. - 192 C.: Il.

When evaluating the parameters of the regression equation, the least squares method (MNC) is applied. At the same time, certain prerequisites are made relative to the random component E. In the model - the random component E is an unobservable value. After an assessment of the model parameters is evaluated, calculating the difference in actual and theoretical values \u200b\u200bof the resulting sign Y, it is possible to determine the estimates of the random component. Since they are not real random remnants, they can be considered some sample implementation of an unknown balance for of this equation, i.e. Ei.

When changing the specification of the model, adding new observations to it, selective estimates of EI residues may vary. Therefore, the task of regression analysis includes not only the construction of the model itself, but also the study of random deviations Ei, i.e. residual values.

When using Fisher and Student's criteria, assumptions are made regarding the behavior of EI residues - residues are independent random variables and their average value is 0; They have the same (constant) dispersion and obey the normal distribution.

Statistical verification of regression parameters, correlation indicators are based on unverified prerequisites for the distribution of the random component Ei. They are only a preliminary character. After constructing the regression equation, there is an inspection of the estimates of EI (random residues) of those properties that were assumed. This is due to the fact that the estimates of the regression parameters must be responsible for certain criteria. They must be unsecured, weissious and effective. These properties of the estimates obtained on MNC are extremely important practical importance in the use of regression and correlation results.

Implementationestimates means that the mathematical expectation of residues is zero. If the estimates have the property of disability, they can be compared according to different studies.

Estimates are considered effectiveif they are characterized by the smallest dispersion. IN practical studies This means the possibility of transition from point estimation to the interval.

Wealth Estimates characterizes an increase in their accuracy with an increase in the sample size. Of great practical interest are the results of the regression for which the confidence interval of the expected value of the regression parameter Bi has a limit of probability values \u200b\u200bequal to one. In other words, the likelihood of obtaining an estimate at a given distance from the true value of the parameter is close to one.

These evaluation criteria (failure, consistency and efficiency) are necessarily taken into account different methods estimation. The least squares method builds regression assessments based on minimizing the sum of the squares of residues. Therefore, it is very important to investigate the behavior of the residual values \u200b\u200bof the regression EI. The conditions necessary to obtain unrelable, wealthy and effective estimates are the prerequisites of MNA, the observance of which is desirable to obtain reliable regression results.

EI residual studies suggest testing the presence of the following five prerequisites of MNA:

1. Random nature of residues;

2. Zero average value residuals independent of xi;

3. Homocheaticism - the dispersion of each deviation Ei is the same for all x values;

4. The absence of autocorrelation of residues - EI residual values \u200b\u200bare distributed independently of each other;

5. The residues are observed by the normal distribution.

If the distribution of random residues ei does not correspond to some prerequisites for MNA, the model should be adjusted.

First of all, the random character of Ei residues is checked - the first background of the MNA. To this end, a schedule of the dependence of the residues Ei from theoretical values \u200b\u200bof the productive feature is worth it.


The first expression allows for the specified values \u200b\u200bof the factor x. Calculate theoretical values \u200b\u200bof the resulting feature, substituting the actual values \u200b\u200bof the factor x.. On the chart, theoretical values \u200b\u200blie on the line, which are a line of regression.

The construction of linear regression is reduced to the estimate of its parameters but and b.. A classic approach to evaluating linear regression parameters is based on the method of least squares (MNA).

To find a minimum it is necessary to calculate private derivatives (4) for each of the parameters - but and b. - and equate them to zero.

(5)

We transform, get system of normal equations:

(6)

In this system n-sampling volume, sums are easily calculated from the source data. We solve the system with respect to but and b.We get:

(7)

. (8)

Expression (7) can be written in another form:

(9)

where covariance of signs, dispersion of factor x.

Parameter b. called recession coefficient.Its value shows the average change in the result with a change in a factor per unit. The possibility of a clear economic interpretation of the regression coefficient made a linear equation of paired regression is sufficiently common in econometric studies.

Formally a - value y. for x \u003d 0. If a x.does not have and cannot have a zero value, then such a interpretation of a free member a.it makes no sense. Parameter a. May not have economic content. Attempts to economically interpret it can lead to absurd, especially when a.< 0. Интерпретировать можно лишь знак при параметре a. If a a.\u003e 0, then the relative change in the result is slower than changing the factor. Compare these relative changes:

< при > 0, > 0 <

Sometimes a linear pair regression equation is recorded for deviations from medium values:

where,. At the same time, the free member is zero, which is reflected in the expression (10). This fact follows from geometrical considerations: the regression equation corresponds to the same straight line (3), but when assessing regression in deviations, the origin moves to a point with coordinates. In this case, in the expression (8), both amounts will be zero, which will entail the equality zero of a free member.

Consider as an example of a group of enterprises producing one type of product, regression dependence of costs from output .

Table 1

Product production of thousands () Production costs, million rubles ()
31,1
67,9
141,6
104,7
178,4
104,7
141,6
TOTAL: 22. 770,0

The system of normal equations will look at:

Solving it, get a \u003d -5.79, B \u003d 36.84.

The regression equation has the form:

Substituting the value in equation h.we will find theoretical values y. (Last table column).

Value a. It does not have economic meaning. If variables x.and y.express through deviations from medium levels, the regression line on the schedule will be held through the origin of the coordinates. The assessment of the regression coefficient will not change:

where,.

As another example, consider the function of consumption in the form:

,

where consumption y.-income, K, L-parameters. This linear regression equation is usually used in conjunction with carrying equality:

,

where I.- investment size, r -saving.

For simplicity, suppose that income is spent on consumption and investment. Thus, the system of equations is considered:

The presence of book equality imposes restrictions on the magnitude of the regression coefficient, which cannot be more than one, i.e. .

Suppose that the consumption function was:

.

The regression coefficient characterizes the tendency to consume. It shows that from every thousand rubles of income to consumption is spent on average 650 rubles, and 350 rubles. Invested. If you calculate the regression of the amount of investment from income, i.e. , the regression equation will be . This equation can not be determined because it is displayed from the consumption function. The regression coefficients of these two equations are associated with the equality:

If the regression coefficient turns out to be more united, then not only income, but also savings is spent on consumption.

The regression coefficient in the consumption function is used to calculate the multiplier:

Here m.≈2.86, therefore additional investments of 1 thousand rubles. For long term, it will lead to additional income to the additional income of 2.86 thousand rubles.

With linear regression, the linear correlation coefficient is the linear correlation as an indicator r:

(11)

Its values \u200b\u200bare within the boundaries :. If a b. \u003e 0, then when b.< 0 . According to the example, which means a very close dependence of the cost of production from the amount of product volume.

To assess the quality of the selection of a linear function calculated coefficient of determinationhow is the square of the linear correlation coefficient r 2.. It characterizes the fraction of the dispersion of the productive y.explained by regression in general dispersion Featuring:

(12)

The value characterizes the fraction of the dispersion y.caused by the influence of other not recorded in the factors model.

In the example. The regression equation explains 98.2% of the dispersion, and on other factors account for 1.8%, this is a residual dispersion.


1.3. MNGS (Gauss Markov conditions)

As mentioned above, the connection between y. and x. In the paired regression is not a functional, but correlation. Therefore, estimates of the parameters a. and b. are random values \u200b\u200bwhose properties are significantly dependent on the properties of the random component ε. To obtain the best results on MN, it is necessary to perform the following prerequisites for the random deviation (Gauss-Markov conditions):

10 . The mathematical expectation of random deviation is zero for all observations: .

twenty . Dispersion of random deviations is constant :.

The feasibility of this background is called gomochedasticity (constracy of dissection variation). Impracticability of this background is called heterosnedasticity (impermanence dispersion dispersion)

thirty . Random deviations ε I. and ε j. are independent of each other for:

The feasibility of this condition is called lack of autocorrelation.

4 0. Random deviation must be independent of the explanatory variables.

Usually, this condition is performed automatically, if explaining the variables in this model are not random. In addition, the feasibility of this prerequisite for econometric models is not so critical compared to the first three.

With the execution of these prerequisites theorem Gaussa-Markov: estimates (7) and (8), obtained by MNA, have the smallest dispersion in the class of all linear unbelievable estimates. .

Thus, when performing the conditions of Gauss-Markova, estimates (7) and (8) are not only unbelievable estimates of regression coefficients, but also most effective, i.e. have a smallest dispersion compared to any other estimates of these parameters, linear relative to the values y I..

It is an understanding of the importance of Gaussha Markov's conditions that distinguishes a competent researcher who uses regression analysis, from incompetent. If these conditions are not fulfilled, the researcher must conscious. If corrective actions are possible, the analyst must be able to execute them. If it is impossible to correct the situation, the researcher must be able to assess how seriously it may affect the results.

Economic phenomena are usually determined large number At the same time, aggregate actors. In this regard, it often arises the task of studying the dependence of the variable w.from several explanatory variables ( x 1, x 2,…, x K)which can be solved with multiple correlation and regression analysis.

When studying dependence by methods multiple regression The task is formed in the same way as when using paired regression, i.e. It is required to determine the analytical expression of the form of communication between the effectively w.and factor signs x, x 2..., x kfind a function where K is the number of factor signs

Multiple regression is widely used in solving problems of demand, profitability of shares, when studying the function of production costs, in macroeconomic calculations and a number of other issues of econometrics. Currently, multiple regression is one of the most common methods in the econometric. The main goal of multiple regression is to build a model with a large number of factors, while determining the effect of each of them separately, as well as the cumulative impact on the simulated indicator.

Due to the peculiarities of the method of least squares in multiple regression, as in the pair, only linear equations and equations referred to the linear view by converting variables are used. Most often used linear equation that can be written as follows:

a 0, a 1, ..., a k - model parameters (regression coefficients);

ε j - random value (the residue value).

Recession coefficient but J shows what magnitude the average sign will change on average y,if the variable h. J increase the measurement unit at a fixed (constant) value of other factors included in the regression equation. Parameters for x. called the coefficients of "clean" regression .

Example.

Suppose that the dependence of food expenditures for combustion of families is characterized by the following equation:

y. - Family expenses for the month of food, thousand rubles;

x. 1 - monthly income per family member, thousand rubles;

x. 2 - family size, man.

Analysis of this equation allows to draw conclusions - with an increase in income per family member by 1 thousand rubles. Nutrition costs will increase by an average of 350 rubles. With the same average family size. In other words, 35% of additional family spending is spent on food. An increase in the size of the family with the same income implies an additional increase in the cost of food by 730 rubles. The first parameter is not subject to economic interpretation.

Evaluation of the reliability of each of the parameters of the model is carried out using the T-criterion of Student. For any of the parameters of the model A j, the T-criterion value is calculated by the formula where


S ε - standard (secondary quadratic) deviation of the regression equation)

determined by the formula

The regression coefficient A J is considered to be quite reliable if the settlement value t-criteria C ( n - K - 1) degrees of freedom exceeds tabular, i.e. T set\u003e t but jn - k -1. If the reliability of the regression coefficient is not confirmed, then it follows; The conclusion about the nonsense in the model of factor j.the sign and the need to eliminate it from the model or replace to another factor sign.

Important role When assessing the influence of factors, the coefficients of the regression model are played. However, directly with their help cannot be compared the factors according to the degree of their influence on the dependent variable due to the difference in the units of measurement and varying degrees of oscillating. To eliminate such differences apply private elasticity coefficientsE J. and beta coefficients β j.

Formula for calculating the coefficient of elasticity

Where

a J - factor regression coefficient j.,

The average value of the product

The average of the sign j.

The elasticity coefficient shows how much percent the dependent variable changes w.when changing factor j. by 1%.

The formula for determining beta - coefficient.

where

S XJ - Average Quadratic Factor Deviation j.;

S y - the average quadratic deviation of the factor y.

β - the coefficient shows which part of the magnitude of the average quadratic deviation S y. The dependent variable will change w.with a change in the corresponding independent variable h. J The value of its average quadratic deviation at a fixed value of the other independent variables.

The share of the influence of a certain factor in the summary influence of all factors can be estimated in magnitude delta coefficients Δ j.

These coefficients allow you to run factors according to the degree of influence of factors on the dependent variable.

The formula for determining the delta coefficient.

r yj is a pair correlation coefficient between factor J and a dependent variable;

R 2 is a multiple determination coefficient.

The multiple determination coefficient is used for quality Assessmentmultiple regression models.

The formula for determining the coefficient of multiple determinism.

The determination coefficient shows the proportion of the variation of an effective feature that is under the influence of factor signs, i.e. determines which proportion variation w.it is taken into account in the model and is due to the influence of the factors included in the model. The closer R 2. to one, the higher the quality of the model

When adding independent variables, the value R 2.increases, so the coefficient R 2.must be adjusted taking into account the number of independent variables by the formula

For checking the significance of the modelregression used Fischer's F-criterion. It is determined by the formula

If the calculated value of the criterion with γ 1., = k. and γ 2 \u003d (n - k - 1) The degrees of freedom are more tabular at a given level of significance, the model is considered significant.

As a measure, accuracy applies a standard error, which represents the ratio of the sum of the squares of the residual components levels to the value (N - K -1):

The classic approach to evaluating the parameters of the linear model is based on method of least squares (MNC). The system of normal equations has the form:

Solution system can be implemented according to one of famous methods: Gauss method, craver method, etc.

Example15.

According to the four enterprises of the region (Table 41), the dependence of production to one employee is studied y. (thousand rubles) from the commissioning of new fundamental funds (% of the cost of funds at the end of the year) and from swelling Working high qualifications in the total number of workers (%). It is required to write the multiple regression equation.

Table 41 - Dependency Development of products per employee