Assessment of the statistical significance of the equation of regression of its parameters. Check the significance of the regression equation

Assessment of the significance of the parameters of the regression equation

Assessment of the significance of the parameters of the equation linear regression It is performed using Student's criterion:

if a t. Calculate \u003e t. kr, then the main hypothesis is adopted ( H O.), testifying to statistical significance regression parameters;

if a T. Calculate< t. kr, then an alternative hypothesis is taken ( H 1.), indicating the statistical insignificance of regression parameters.

where m A. , m B. - standard parameter errors a.and B:

(2.19)

(2.20)

Critical (table) The value of the criterion is using statistical tables of Student distribution (Appendix B) or on tables Excel (Section Master of Functions "Statistical"):

t. kr \u003d artudspob ( α \u003d 1-p; k \u003d n-2), (2.21)

where k \u003d n-2also represents the number of degrees .

Assessment of statistical significance can be applied to a linear correlation coefficient

where m R. - Standard error determining the values \u200b\u200bof the correlation coefficient r yx.

(2.23)

Below are options for assignments for practical and laboratory work on the subject of the second section.

Questions for self-testing in 2 section

1. Indicate the main components of the econometric model and their essence.

2. The main content of the stages of an econometric study.

3. The essence of approaches to determine the parameters of linear regression.

4. Entity and feature of the application of the method smallest squares When determining the parameters of the regression equation.

5. What indicators are used to assess the closeness of the relationships of the investigated factors?

6. Essence linear coefficient Correlation.

7. The essence of the determination coefficient.

8. Essence and main features of evaluation procedures for adequacy (statistical significance) regression models.

9. Evaluation of the adequacy of linear regression models on the approximation coefficient.

10. The essence of an appropriate approach of the adequacy of regression models by Fisher's criterion. Definition of empirical I. critical values Criteria.

11. The essence of the concept of "dispersion analysis" is applied to econometric studies.

12. Essence and main features of the procedure for assessing the significance of parameters linear equation regression.

13. Features of the use of Student's distribution in assessing the significance of the parameters of the linear regression equation.

14. What is the task of the forecast of the single values \u200b\u200bof the underlying socio-economic phenomenon?

1. Construct the correlation field and formulate the assumption of the form of the relationship of the realization of the factors under study;

2. Record the basic equations of the least squares method, produce the necessary transformations, make a table for intermediate calculations and determine the parameters of the linear regression equation;

3. Check the correctness of the calculations using standard procedures and electronic functions excel tables.

4. To analyze the results, formulate conclusions and recommendations.

1. Calculation of the values \u200b\u200bof the linear correlation coefficient;

2. Construction of a dispersion analysis table;

3. Evaluation of the determination coefficient;

4. Check the correctness of the calculations carried out using standard procedures and features of Excel spreadsheets.

5. To analyze the results, formulate conclusions and recommendations.

4. To carry out a general assessment of the adequacy of the selected regression equation;

1. Evaluation of the adequacy of the equation by the values \u200b\u200bof the approximation coefficient;

2. Evaluation of the adequacy of the equation by the values \u200b\u200bof the determination coefficient;

3. Evaluation of the adequacy of the equation by the criterion of Fisher;

4. To carry out a general assessment of the adequacy of the parameters of the regression equation;

5. Check the correctness of the calculations carried out using standard procedures and features of Excel spreadsheets.

6. To analyze the results, formulate conclusions and recommendations.

1. Using standard procedures of the Master of EXCEL spreadsheet functions (from "mathematical" and "statistical" sections);

2. Preparation of data and features of the use of the "Linene" function;

3. Preparation of data and features of the use of the "Prediction" function.

1. Use of standard procedures for analyzing the EXCEL spreadsheet data packet;

2. Preparation of data and features of the application of the procedure "Regression";

3. Interpretation and generalization of data table regression analysis;

4. Interpretation and generalization of data dispersion table data;

5. Interpretation and generalization of the data assessment table of the validity of the parameters of the regression equation;

When performing laboratory work according to one of the options, you must perform the following private tasks:

1. Select the form of the equation of the relationship of the factors under study;

2. Determine the parameters of the regression equation;

3. To assess the closeness of the relationship of the investigated factors;

4. To assess the adequacy of the selected regression equation;

5. To assess the statistical significance of the parameters of the regression equation.

6. Check the correctness of the calculations using standard procedures and EXCEL spreadsheet functions.

7. To analyze the results, formulate conclusions and recommendations.

Tasks for practical and laboratory work on the topic "Paired linear regression and correlation in econometric studies".

Option 1 Option 2. Option 3. Option 4. Option 5.
X. y. X. y. X. y. X. y. X. y.
Option 6. Option 7. Option 8. Option 9. Option 10.
X. y. X. y. X. y. X. y. X. y.

After the regression equation is built and with the help of the determination coefficient, its accuracy is estimated, it remains an open question due to which this accuracy is achieved and, accordingly, it is possible to trust this equation. The fact is that the regression equation was not based on general aggregatewhich is unknown, and on the sample of it. Points from the general aggregate fall into the sample randomly, according to this, in accordance with the theory of probability, among other cases, an option is possible when the sample from the "wide" set of a set will be "narrow" (Fig. 15).

Fig. fifteen. Possible variant Ingress points to the sample from the general population.

In this case:

a) the regression equation, built on the sample, can significantly differ from the regression equation for the general population, which will lead to forecast errors;

b) The determination coefficient and other characteristics of the accuracy will be unnecessarily high and will mislead the predictive qualities of the equation.

In the ultimate case, the option is not excluded when from the general population of a cloud with the main axis A parallel horizontal axis (there is no connection between variables) due to random selection, the sample will be obtained, the main axis of which will be inclined to the axis. Thus, attempts to predict the next values \u200b\u200bof the general population based on these sample data from it are fraught with not only errors in assessing the strength and direction of communication between dependent and independent variables, but also the danger of finding the connection between variables where there is actually no.

In the absence of information on all points of the general population, the only way to reduce errors in the first case is to use when evaluating the coefficients of the regression equation of the method that ensures their inconsistency and efficiency. And the probability of the onset of the second case can be significantly reduced due to the fact that a priori knows one property of the general population with two independent variables from each other - it does not have this connection. This reduction is achieved by verifying the statistical significance of the obtained regression equation.

One of the most frequently used verification options is as follows. For the obtained regression equation, the static - characteristic of the accuracy of the regression equation is determined, which is the ratio of the part of the dispersion of the dependent variable that is explained by the regression equation to the inexplicable (residual) part of the dispersion. The equation for determining the statistics in the case of multidimensional regression has the form:

where: - explained dispersion - part of the dispersion of the dependent variable y, which is explained by the regression equation;

The residual dispersion is part of the dispersion of the dependent variable y which is not explained by the regression equation, its presence is a consequence of the action of a random component;

The number of points in the sample;

The number of variables in the regression equation.

As can be seen from the above formula, the dispersion is defined as the private from the division of the corresponding sum of squares by the number of degrees of freedom. The number of freedom of freedom is the minimum required number of values \u200b\u200bof the dependent variable, which is sufficient to obtain the desired sample characteristic and which can freely vary with the fact that all other values \u200b\u200bused to calculate the desired characteristic are known for this sample.

To obtain a residual dispersion, the coefficients of the regression equation are necessary. In the case of paired linear regression of coefficients, two, according to this, in accordance with the formula (taking), the number of freedom degrees is equal. It is understood that to determine the residual dispersion it is enough to know the coefficients of the regression equation and only the values \u200b\u200bof the dependent variable from the sample. The remaining two values \u200b\u200bcan be calculated on the basis of these data, and therefore are not freely varying.

To calculate the explanated dispersion of the values \u200b\u200bof the dependent variable are not required at all, since it can be calculated, knowing the regression coefficients with independent variables and the dispersion of an independent variable. In order to make sure this is enough to remember the expression previously . In this, the number of degrees of freedom for residual dispersion is equal to the number of independent variables in the regression equation (for paired linear regression).

As a result, the criteria for the pair linear regression equation is determined by the formula:

.

In theory theory, it is proved that the crossing of the regression equation obtained for the sample from the general population in which there is no connection between the dependent and independent variable has the distribution of Fisher, well-studied. Due to this, for any value of the criteria, it is possible to calculate the likelihood of its appearance and vice versa, to determine the value of the value that it cannot exceed with a given probability.

To implement a statistical verification of the significance of the regression equation, a zero hypothesis about the absence of communication between variables is formulated (all coefficients with variables are zero) and the level of significance is selected.

The level of significance is the permissible probability of the first kind of error - reject as a result of the verification of the correct zero hypothesis. In the case under consideration, make a mistake of the first kind means to recognize on the sample the presence of a connection between variables in the general population, when in fact it is not there.

Typically, the level of significance is taken equal to 5% or 1%. The higher the level of significance (the less), the higher the level of reliability of the test, equal, i.e. The greater the chance to avoid the error of recognition on the sample of communication in the general population in fact unrelated variables. But with an increase in the level of significance, the danger of making a mistake of the second kind increases - reject the correct zero hypothesis, i.e. Do not notice on the sample there is actually the connection of variables in the general population. According to this, depending on which error has large negative consequences, one or another level of significance is chosen.

For the selected level of importance on the distribution of the Fischer, a table value is determined by the probability of exceeding, which in the sample of the power obtained from the general population without communication between variables does not exceed the level of significance. Compared with the actual value of the criterion for regression equation.

If a condition is satisfied, then the erroneous detection of communication with the value of the-criteria is equal to or large on the sample from the general population with unrelated variables will occur with a probability of less than the level of significance. In accordance with the rule, "very rare events does not happen," we come to the conclusion that the connection established by the sample between the variables is also available in the general population from which it is obtained.

If it turns out, the regression equation is statistically not significantly significant. In other words, there is a real chance that the sample is installed that does not exist in reality the relationship between variables. To the equation that could not withstand inspection on statistical significance, refer to the same way as to the medicine with expired no one

Ty - such medicines are not necessarily corrupted, but once there is no confidence in their quality, they prefer not to use them. This rule does not save all errors, but avoids the richest, which is also quite important.

The second version of the verification is more convenient in the case of the use of electronic tables, this comparison of the likelihood of the appearance of the obtained value of the criteria with the level of significance. If this probability turns out to be lower than the level of significance, it means that the equation is statistically significant, otherwise not.

After the inspection of the statistical significance of the regression equation is generally useful, especially for multidimensional dependencies to verify the statistical significance of the obtained regression coefficients. The ideology of testing is the same as when checking the equation as a whole, but as a criterion uses-criterion of Student, determined by formulas:

and

where:, - the values \u200b\u200bof the Student criterion for coefficients and, accordingly;

- residual dispersion of the regression equation;

The number of points in the sample;

The number of variables in the sample, for paired linear regression.

The resulting actual values \u200b\u200bof the Student criterion are compared with table values obtained from Student's distribution. If it turns out that, the corresponding coefficient is statistically significant, otherwise not. The second option to check the statistical significance of the coefficients is to determine the probability of the appearance of the Student's criterion and compare with the level of significance.

For variables, whose coefficients turned out to be statistically not significant, the likelihood of the fact that their influence on the dependent variable in the general population is generally absent. Therefore, or it is necessary to increase the number of points in the sample, then the coefficient may become statistically significant and at the same time will specify its value, or as independent variables find others, more closely related to the dependent variable. The accuracy of prediction at the same time in both cases will increase.

As an express method for assessing the significance of the coefficients of the regression equation, the following rule can be applied - if the Student's criterion is greater than 3, then such a coefficient is usually statistically significant. But in general it is believed that to obtain statistically meaningful equations Regression is necessary in order to be implemented.

The standard prediction error in the obtained regression equation of an unknown value is assessed by the formula:

Thus, the forecast with a trust probability of 68% can be represented as:

If there is a different confidence probability, it is necessary to find the Student's criterion for significance and trust interval For the forecast with the level of reliability will be equal .

Prediction of multidimensional and nonlinear dependencies

If the predicted value depends on several independent variables, then in this case there is a multidimensional regression of the form:

where: - regression coefficients describing the influence of variables on the predicted value.

The method of determining the regression coefficients does not differ from the pair linear regression, especially when using the spreadsheet, since there is the same function and for the pair and for multidimensional linear regression. It is desirable that there are no interrelations between independent variables, i.e. The change in one variable did not affect the values \u200b\u200bof other variables. But this requirement is not mandatory, it is important that there are no functional linear dependencies between the variables. The procedures described above the verification of the statistical significance of the obtained regression equation and its individual coefficients, the estimate of the prediction accuracy remains the same as for the case of paired linear regression. At the same time, the use of multidimensional regression instead of a steam room usually allows us to significantly improve the accuracy of the description of the behavior of the dependent variable, which means the accuracy of forecasting.

In addition, the multidimensional linear regression equation allows to describe the nonlinear dependence of the projected value from independent variables. The procedure for bringing a nonlinear equation to a linear view is called linearization. In particular, if this dependence is described by a polynomial degree different from 1, then, having replaced the variables with degrees differ from the unit to new variables in the first degree, we obtain the problem of multidimensional linear regression instead of nonlinear. So, for example, if the effect of an independent variable is described by parabola

then the replacement allows you to transform a nonlinear problem to a multidimensional linear view

Nonlinear tasks can also be converted easily due to the fact that the predicted value depends on the product of independent variables. To account for such an influence, you must enter a new variable equal to this product.

In cases where nonlinearity is described by more complex dependencies, linearization is possible due to the conversion of coordinates. Values \u200b\u200bare calculated for this and graphs of the dependence of the source points are being built in various combinations of transformed variables. The combination of converted coordinates or transformed and not transformed coordinates in which the dependence is closest to the straight line suggests the replacement of variables that will result in the transformation of non-linear dependence to the linear view. For example, a nonlinear dependence of the species

turns into a linear view

The obtained regression coefficients for the transformed equation remain unsecured and effective, but the verification of the statistical significance of the equation and coefficients is impossible

Verification of the validity of the application of the least squares method

The application of the least squares method ensures the effectiveness and failure to estimate the coefficients of the regression equation under the following conditions (Gaus Markov conditions):

3. Values \u200b\u200bdo not depend on each other.

4. Values \u200b\u200bdo not depend on independent variables

You can most simply check the observance of these conditions by building residue graphs depending on, then from independent (independent) variables. If the points on these graphs are located in the corridor located symmetrically the abscissa axis and in the location of the points are not viewed by regularities, then the conditions of Gaus-Markov have been fulfilled and the ability to improve the accuracy of the regression equation are also available. If this is not the case, then it is possible to significantly improve the accuracy of the equation and it is necessary to refer to the special literature.

To assess materiality, the significance of the correlation coefficient is used by the Student T-criterion.

There is an average error of the correlation coefficient by the formula:

N.
the basis of the error is calculated by the criterion:

The calculated value of the T-criterion is compared with the table, found in the Student distribution table at the level of significance of 0.05 or 0.01 and the number of degrees of freedom N-1. If the calculated value of the T-criterion is greater than the table, then the correlation coefficient is recognized as significant.

In curvilinear communication to assess the significance of the correlation ratio and the regression equation, the F criterion applies. It is calculated by the formula:

or

where η is a correlation relationship; n - the number of observations; M is the number of parameters in the regression equation.

The calculated value F is compared with a table for the adopted level of significance α (0.05 or 0.01) and the numbers of the degrees of freedom to 1 \u003d M-1 and k 2 \u003d n-m. If the calculated value f exceeds the tabular, the relationship is recognized as significant.

The significance of the regression coefficient is established using the Student t-criterion, which is calculated by the formula:

where σ 2 A I is the dispersion of the regression coefficient.

It is calculated by the formula:

where k is the number of factor signs in the regression equation.

The regression coefficient is recognized as significant if T A 1 ≥T cr. T of the Kyrgyz Republic is deepening in the table of critical stitch points of Student when the level of significance and the number of degrees of freedom k \u003d n-1.

4.3.Correlation-regression analysis in Excel

Conduct a correlation and regression analysis of the relationship between the yield and labor costs per 1 C grain. To do this, open the Excel sheet, in cells A1: A30 we enter the values \u200b\u200bof the factor The yield of grain crops, in cells B1: B30 the values \u200b\u200bof the productive sign - the cost of 1 C of grain. In the Tools menu, select the data analysis option. By clicking on the left mouse button on this item, open the regression tool. Click on the OK button, the regression dialog box appears on the screen. In the input interval field, we enter the values \u200b\u200bof the performance (highlighting the cells B1: B30), in the field of the inlet interval x we \u200b\u200benter the value of the factor of the sign (highlighting cells A1: A30). We note the level of probability of 95%, choose a new working sheet. Click on the OK button. The work sheet appears the "output output" table, in which the results of calculating the parameters of the regression equation, correlation coefficient and other indicators to determine the significance of the correlation coefficient and the parameters of the regression equation are given.

Total outcome

Regression statistics

Multiple R.

R-square

Normal R-Square

Standard error

Observations

Dispersion analysis

Significance F.

Regression

Factors

Standard error

t-statistics

P-value

Lower 95%

Top 95%

Lower 95.0%

Top 95.0%

Y-crossing

Variable x 1.

This table "Multiple R" is the correlation coefficient, "R-square" - the coefficient of determination. "The coefficients: Y-intersection" is a free member of the regression equation 2.836242; "Variable x1" - regression coefficient -0.06654. Here there are also values \u200b\u200bof Fisher's Fischer 74,9876, T-criterion of Student 14,18042, "Standard error 0.112121", which are necessary to assess the significance of the correlation coefficient, parameters of the regression equation and the entire equation.

Based on the data of the table, we construct the regression equation: at x \u003d 2,836-0.067x. The regression coefficient A 1 \u003d -0.067 means that with an increase in grain yield on 1 c / ha, labor costs per 1 grain decrease by 0.067 people-

The correlation coefficient r \u003d 0.85\u003e 0.7, therefore, the relationship between the studied signs in this aggregate is close. The determination coefficient R 2 \u003d 0.73 shows that 73% of the variation of the productive feature (labor costs per 1 grade) is caused by the action of a factor (grain yield).

In the table of critical points of the distribution of Fisher - Snedel, we will find the critical value of the F-criterion at the level of significance of 0.05 and the number of freedom of freedom to 1 \u003d M-1 \u003d 2-1 \u003d 1 and k 2 \u003d nm \u003d 30-2 \u003d 28, it is equal 4.21. Since the calculated value of the criterion is greater than the table (F \u003d 74.9896\u003e 4.21), the regression equation is recognized as significant.

To assess the significance of the correlation coefficient, calculate the T-criterion of Student:

IN
table of critical stitch distribution points will find the critical value of the Critrium at the level of significance of 0.05 and the number of freedom of freedom N-1 \u003d 30-1 \u003d 29, it is 2.0452. Since the estimated value is more tabular, the correlation coefficient is significant.

Regression analysis is a statistical research method that show the dependence of a parameter from one or several independent variables. The application was difficult to use it in a compuscript era, especially if it were about large amounts of data. Today, learning how to build regression in Excel, you can solve complex statistical tasks in literally in a couple of minutes. Below are represented specific examples from the field of economics.

Types of regression

This very concept was introduced into mathematics in 1886. Regression happens:

  • linear;
  • parabolic;
  • power;
  • exponential;
  • hyperbolic;
  • indicative;
  • logarithmic.

Example 1.

Consider the task of determining the dependence of the number of those who quenched members of the team from the average salary in 6 industrial enterprises.

A task. In six enterprises analyzed the average monthly wages and the number of employees who resigned by own willing. In tabular form we have:

The number of faded

The salary

30000 rubles

35,000 rubles

40000 rubles

45,000 rubles

50,000 rubles

55,000 rubles

60000 rubles

For the problem of determining the dependence of the quantity of workers overwhelmed from the average salary in 6 enterprises, the regression model has the form of an equation y \u003d a 0 + a 1 x 1 + ... + a k x k, where x i is the influencing variables, and the regression coefficients, A K is the number of factors.

For this task, Y is an indicator of those who quarreled employees, and the influencing factor - the salary that X is denoted by X.

Using the capabilities of the "Excel" table processor

Regression analysis in Excel should be preceded by the application to the existing table data of the built-in functions. However, for these purposes it is better to use a very useful superstructure "analysis package". To activate it, you need:

  • from the File tab, go to the "Parameters" section;
  • in the window that opens, select the "superstructure" string;
  • click on the "Go button" below, to the right of the Row "Management";
  • put a tick next to the name "Analysis Package" and confirm your actions by clicking OK.

If everything is done correctly, on the right side of the "Data" tab, located above the Workstation "Excel", the desired button will appear.

in Excel

Now, when you have all the necessary virtual tools for the implementation of econometric calculations, we can proceed to solve our task. For this:

  • click on the "Data Analysis" button;
  • in the window that opens, click on the "Regression" button;
  • in the tab that appears, we enter the range of values \u200b\u200bfor Y (the number of abolished employees) and for x (their salaries);
  • confirm your actions by pressing the "OK" button.

As a result, the program will automatically fill out a new sheet of table processor with regression analysis data. Note! Excel has the ability to independently ask the place you prefer for this purpose. For example, it may be the same sheet where the values \u200b\u200bare y and x, or even a new book specifically designed to store such data.

Analysis of regression results for R-square

In Excel, the data obtained during the processing of the data under consideration seems to be:

First of all, you should pay attention to the value of the R-square. It is the determination coefficient. IN this example R-square \u003d 0.755 (75.5%), i.e. the calculated parameters of the model explain the relationship between the parameters under consideration by 75.5%. The higher the value of the determination coefficient, the selected model is considered more applicable for a particular task. It is believed that it correctly describes the actual situation with the value of the R-square above 0.8. If R-square<0,5, то такой анализа регрессии в Excel нельзя считать резонным.

Analysis of coefficients

The number 64,1428 shows what will be y if all variables XI in the model we are reset. In other words, it can be argued that the value of the analyzed parameter also affect other factors not described in the specific model.

The following coefficient -0.16285, located in the B18 cell, shows the weight of the effect of the variable x on Y. This means that the average monthly salary of employees within the model under consideration affects the number of -0,16285, i.e., the degree of its influence is at all small. The sign "-" indicates that the coefficient has a negative value. This is obvious, as everyone knows that the more salary in the enterprise, the less people express a desire to terminate the employment contract or dismissed.

Multiple regression

Under such a term is understood as the equation of communication with several independent variables of the type:

y \u003d f (x 1 + x 2 + ... x m) + ε, where y is a resulting feature (dependent variable), and x 1, x 2, ... x M is signs of factors (independent variables).

Evaluation of parameters

For multiple regression (MR), it is carried out using the method of smallest squares (MNC). For linear equations of the form y \u003d a + b 1 x 1 + ... + b m x M + ε we build a system of normal equations (see below)

To understand the principle of the method, consider a two-factor case. Then we have the situation described by the formula

From here we get:

where σ is the dispersion of the corresponding feature reflected in the index.

MNK is applicable to an MR equation in a standardized scale. In this case, we get the equation:

in which T y, t x 1, ... t xm is standardized variables for which the average values \u200b\u200bare 0; β i is standardized regression coefficients, and the standard deviation is 1.

Please note that all β i in this case are specified as normalized and centralized, therefore, their comparison is considered correct and admissible. In addition, it is customary to carry out differentials of factors, discarding those of which the smallest values \u200b\u200bof βi.

Task using linear regression equation

Suppose there is a table of dynamics of the price of a specific product N over the past 8 months. It is necessary to decide on the feasibility of acquiring his party at a price of 1850 rubles / t.

number of month

name of the month

product price N.

1750 rubles per ton

1755 rubles per ton

1767 rubles per ton

1760 rubles per ton

1770 rubles per ton

1790 rubles per ton

1810 rubles per ton

1840 rubles per ton

To solve this task in the Excel Table Processor, it is required to use the "Data Analysis" tool presented above. Next, choose the "Regression" section and set the parameters. It must be remembered that the range of values \u200b\u200bfor the dependent variable must be introduced in the "Input Input Interval Y" (in this case, the price of goods in specific months), and in the "Input Interval X" - for an independent (number of the month). Confirm the actions by pressing OK. On a new sheet (if it was so indicated) we obtain data for regression.

We build the linear equation of the form y \u003d ax + b, where the ratio of the number of the month and the coefficients and lines "Y-intersection" from the sheet with the results of the regression analysis protrude as parameters A and B. Thus, the regression linear equation (UR) for task 3 is written in the form:

Price to product N \u003d 11.714 * Month month + 1727.54.

or in algebraic notation

y \u003d 11,714 x + 1727,54

Analysis of the results

To decide whether the resulting linear regression equations are adequately, the multiple correlation coefficients (KMK) and determination, as well as the Fisher's criterion and the Student criterion are used. In the Table "Excel" with the results of regression, they act as multiple R, R-square, F-statistics and T-statistics, respectively.

KMK R makes it possible to evaluate the closeness of the probabilistic connection between independent and dependent variables. Its high value indicates a sufficiently strong connection between the variables "number of the month" and "the price of a product N in rubles per 1 ton." However, the nature of this connection remains unknown.

The square of the determination coefficient R 2 (RI) is a numeric characteristic of the share of the total scattering and shows the scatter of which part of the experimental data, i.e. The values \u200b\u200bof the dependent variable corresponds to the linear regression equation. In the problem under consideration, this value is 84.8%, i.e., statistical data with a high degree of accuracy are described by the OR obtained.

F-statistics, also called Fisher's criterion, is used to assess the importance of linear dependence, refuting or confirming the hypothesis of its existence.

(Student's criterion) helps assess the significance of the coefficient at an unknown or free member of linear dependence. If the value of the T-criterion is\u003e t, the hypothesis of insignificance of a free member of the linear equation is rejected.

In the problem under consideration for a free member, using the "Excel" tools, it was obtained that t \u003d 169,20903, and p \u003d 2.89e-12, i.e. we have a zero probability that the correct hypothesis of insignificance of a free member will be rejected. For the coefficient at an unknown T \u003d 5,79405, and p \u003d 0.001158. In other words, the likelihood that the correct hypothesis of the insignificance of the coefficient is rejected at an unknown, is 0.12%.

Thus, it can be argued that the resulting equation of linear regression is adequately.

Task on the feasibility of buying a package of shares

Multiple regression in Excel is performed using the entire "data analysis" tool. Consider a specific applied task.

Management Company "NNN" should decide on the feasibility of buying a 20% stake in MMM JSC. The cost of the package (SP) is 70 million US dollars. Specialists "NNN" collected data on similar transactions. It was decided to assess the cost of a stake in such parameters expressed in millions of American dollars as:

  • accounts payable (VK);
  • volume of annual turnover (VO);
  • receivables (VD);
  • the cost of fixed assets (SOF).

In addition, the settlement of the wage enterprise (V3 P) in thousands of US dollars is used.

Solution tools for a table processor Excel

First of all, you need to make a table of source data. It has the following form:

  • call the "Data Analysis" window;
  • select the section "Regression";
  • in the "Input Interval Y" window, a range of values \u200b\u200bof dependent variables from column G are introduced;
  • click on the icon with a red arrow to the right of the window "Input interval X" and allocate the range of all values \u200b\u200bfrom columns B, C, D, F.

The item "New Work List" and click "OK".

Receive analysis for this task.

Study of the results and conclusions

"Collect" from the rounded data presented above on a sheet of a table processor Excel, the regression equation:

SP \u003d 0.103 * Sof + 0.541 * VO - 0.031 * VK + 0.405 * Vd + 0.691 * VZP - 265,844.

In a more familiar mathematical form, it can be written as:

y \u003d 0.103 * x1 + 0,541 * x2 - 0.031 * x3 + 0,405 * x4 + 0,691 * x5 - 265,844

Data for MMM JSC are presented in Table:

Substituting them into the regression equation, they receive a figure of 64.72 million US dollars. This means that the shares of MMM JSC should not be purchased, since their cost of 70 million US dollars is sufficiently overestimated.

As we see, the use of the "Excel" table processor and the regression equations made it possible to adopt a reasonable decision regarding the feasibility of a completely specific transaction.

Now you know what regression is. Excel examples discussed above will help you in solving practical tasks from the field of econometrics.

In socio-economic research, it is often necessary to work in a limited aggregate, or with selective data. Therefore, after mathematical parameters, the regression equation must evaluate them and the equation in general on statistical significance, i.e. It is necessary to make sure that the obtained equation and its parameters are formed under the influence of non-random factors.

First of all, the statistical significance of the equation as a whole is estimated. The assessment is usually carried out using Fisher's F-Criteria. The calculation of the F-criterion is based on the rules for the addition of dispersions. Namely, the total dispersion feature-result \u003d dispersion factor + dispersion is residual.

Actual price

Theoretical price
Buing the regression equation, you can calculate the theoretical value of the character-result, i.e. Calculated on the regression equation, taking into account its parameters.

These values \u200b\u200bwill characterize the recognition of the result that has formed under the influence of factors included in the analysis.

There are always discrepancies between the actual values \u200b\u200bof the sign-result and calculated on the basis of the regression equation, there are always discrepancies (residues) due to the influence of other factors not included in the analysis.

The difference between theoretical and actual values \u200b\u200bof the character-result is called residues. Total variation of the sign-result:

The variation on the basis of the result due to the variation of the signs of factors included in the analysis is estimated through the comparison of theoretical values. Sign and its average values. Residual variation through the comparison of theoretical and actual values \u200b\u200bof the resulting feature. Total dispersion The residual and actual have a different number of degrees of freedom.

Common p- the number of units in the underlying aggregate

Actual P- the number of factors included in the analysis

Residual

The Fisher's F-criterion is calculated as attitudes to, and one degree of freedom is calculated.

The use of Fisher's F-Criteration as an assessment of the statistical significance of the regression equation is very logical. - This is the result. The feature due to factors included in the analysis, i.e. This is the proportion explained by the result. Sign. - this (variation) of a sign of the result due to factors the influence of which is not taken into account, i.e. Not included in the analysis.

So F-criterion is designed to estimate meaningful Excess one. If it is insignificantly lower, and even more so if it exceeds, therefore, the analysis includes not those factors that really affect the sign-result.

Fisher's Fishera Criteria, the actual value is compared with the table. If, the regression equation is recognized as statistically significant. If, on the contrary, the equation is not statistically significant and cannot be used in practice, the significance of the equation as a whole indicates the statistical significance of the indices of the rootation.

After evaluating the equation as a whole, it is necessary to estimate the statistical significance of the parameters of the equation. This estimate is carried out using T-Statistics Student. T-statistics are calculated as the ratio of the parameters of the equation (module) to their standard average quadratic error. If a single-factor model is estimated, 2 statistics are calculated.

In all computer programs, the calculation of a standard error and t-statistics for parameters is carried out with the calculation of the parameters themselves. T-statistics tabulated. If the value is, the parameter is recognized as statistically significant, i.e. Formed under the influence of non-random factors.

The calculation of T-statistics is essentially means checking the zero hypothesis of the insignificance of the parameter, i.e. equality it is zero. With a single-factor model, 2 hypotheses are estimated: and

The level of significance of the adoption of zero hypothesis depends on the level of adopted trust probability. So if the researcher sets a probability level of 95%, the level of significance of the adoption will be calculated, therefore, if the level of significance is ≥ 0.05, then the parameters are considered statistically insignificant. If, the alternative is rejected and accepted: and.

In the packages of application programs according to statistics, the level of significance of the adoption of zero hypotheses is also provided. Assessment of the significance of the regression equation and its parameters can give the following results:

Firstly, the equation is generally significant (according to the F-criterion) and also statistically significant are the parameters of the equation. This means that the obtained equation can be used as for adoption. management solutionsand forecasting.

Secondly, according to the F-criterion, the equation is statistically significant, but if only one of the parameters of the equation will not mean. The equation can be used to make management decisions relative to the analyzed factors, but cannot be used to predict.

Thirdly, the equation is statistically not significantly significant, or by the F-criterion, the equation is significantly, but all the parameters of the obtained equation are not significant. The equation cannot be used not for what purposes.

In order for the regression equation to recognize the communication model between the sign-result and factors, it is necessary that all the most important factors determining the result are included in it so that the content interpretation of the parameters of the equation corresponds to theoretically substantiated relations in the studied phenomenon. The determination coefficient R 2 should be\u003e 0.5.

When constructing multiple equation Regression It is advisable to assess the so-called correctional coefficient of determinism (R 2). The value of R 2 (as well as rooting) increases with an increase in the number of factors included in the analysis. Especially overestimated by the value of the coefficient-in in conditions of small aggregates. In order to repay the negative effect of R 2 and the rooting corrected, taking into account the number of degrees of freedom, i.e. Numbers freely varying elements when you turn on certain factors.

Corrected coefficient determination

p - Compact compaction / observation number

k.- the number of factors included in the analysis

p-1 - the number of degrees of freedom

(1-R 2) - the value of the residue / inexplicable dispersion of the performance

Always less R 2.. On the basis, it is possible to compare estimates of equations with different numbers of analyzed factors.

34. Tasks for studying dynamic series.

Rows of speakers are called temporary rows or dynamic rows. The dynamic series is a time-ordered sequence of indicators characterizing this or that phenomenon (GDP volume from 90 to 98 yg). The purpose of studying the series of dynamics is to identify the patterns of development of the studied phenomenon (main trend) and forecasting on this basis. From the definition of the RD, it follows that any number consists of two elements: Time T and the level of the row (those specific values \u200b\u200bof the indicator on the basis of which is designed for a while). Dribals can be 1) torque - rows, indicators of which are fixed at the time of time, for a specific date, 2) interval - rows, whose indicators are obtained for a period of time (1. 1. Population of St. Petersburg, 2. GDP for the period). The separation of the rows on the moment and interval is necessary, since this determines the specifics of the calculation of some indicators of the dolkov. Summation of levels interval rows It gives a meaningful interpretable result, which cannot be said to summarize the levels of torque series, since the latter contain a repeated account. The most important problem in the analysis of the series of speakers is the problem of comparability of row levels. This concept is very diverse. Levels should be comparable to calculation methods and on the territory and coverage of the units of the aggregate. If a rig is built in cost indicators, then all levels should be represented or calculated in comparable prices. When constructing interval rows, levels should characterize the same time segments. When constructing torque, levels should be fixed on the same date. Dribus can be complete and incomplete. Incomplete rows are used in official publications (1980,1985,1990,1995,1996,1997,1998,1999 ...). Comprehensive analysis of the RD includes the study of the following points:

1. Calculation of indicators of changes in the levels of RD

2. Calculation of average RD indicators

3. Detection of the main trend of the series, building trend models

4. Evaluation of autocorrelation in the RD, building autoregression models

5. Correlation of the RD (study of the links M / in the Dribus)

6. Predicing RD.

35. Indicators of change levels of temporary series .

IN general An input can be presented:

u is the level of DR, T - the moment or the period of time to which the level refers (indicator), n is the length of the dye (the number of periods). In the study of a number of dynamics, the following indicators are calculated: 1. Absolute increase, 2. Growth coefficient (growth rate), 3. Acceleration, 4. The growth rate (increment rate), 5. absolute value 1% increase. The calculated indicators may be: 1. Chains are obtained by comparing each level of the row with directly preceding, 2. Bases are obtained by mapping with the level selected for the comparison base (if it is not specifically stipulated, the 1st level of the row is taken specifically). 1. Chain absolute gains: . Shows how much more or less. Chain absolute gains are called the rate of changes in the levels of dynamic series. Basis absolute increase:. If the levels of the row are relative indicators expressed in%, then the absolute increase is expressed in points of change. 2. Growth coefficient (growth rates):It is calculated as the ratio of the levels of the row to the directly preceding (chain growth rates), or to the level adopted for the comparison base (basic growth factors) :. Characterizes how many times each row level\u003e or< предшествующего или базисного. На основе коэффициентов роста рассчитываются темпы роста. Это коэффициенты роста, выраженные в %ах: 3. Based on absolute gains, the indicator is calculated - acceleration of absolute increments:. Acceleration is an absolute increase in absolute gains. Assesses how the gains themselves change, they are stable or accelerated (increase). 4. Top of increment - This is an increase in the comparison database. It is expressed in% ... . The growth rate is the growth rate minus 100%. Shows how much% this level of row\u003e or< предшествующего либо базисного. 5. абсолютное значение 1% прироста. Рассчитывается как отношение абсолютного прироста к темпу прироста, т.е.: - сотая доля предыдущего уровня. Все эти показатели рассчитываются для оценки степени изменения уровней ряда. Цепные коэффициенты и темпы роста называются показателями интенсивности изменения уровней ДРядов.

2. Calculation of average RD indicators Calculate the average levels of the series, the average absolute increases, the average growth rates and the average growth rates. Average indicators are calculated in order to summarize information and the ability to compare the levels and indicators of their changes in various rows. 1. Medium level of row a) For interval time series, it is calculated on the middle arithmetic simple: where n is the number of levels in the time series; b) For torque series, the average level is calculated by a specific formula called medium chronological: . 2. Middle absolute increase It is calculated on the basis of chain absolute increases in the middle arithmetic simple:

. 3. Middle growth coefficient It is calculated based on the chain growth coefficients according to the formula of medium geometric :. With the comments of the averages of the Distribution, it is necessary to indicate 2 points: a period that characterizes the analyzed indicator and the time interval for which is built in a row. 4. Middle growth rate: . 5. Medium growth rate: .