Pair correlation coefficient in Excel. Build a matrix of paired correlation coefficients

Examination No. 2

Option number 5

Exercise 1. Using computer technology, conduct a correlation-regression analysis of the studied economic indicators and build a regression model………………………..…..3

1.1 Construction of the correlation field …………………………………………4

1.2 Construction of a matrix of pair correlation coefficients……………6

6

1.4 Building a linear one-factor regression model……….10

1.5 Conclusions……………………………………………………………………… 15

Task 2. Using computer technology, solve linear programming problems………………………………………………….18

a) The problem of optimal production planning……………….19

1. Mathematical statement of the problem……………………………………..19

2. Placement of initial data on the MS Excel TP worksheet, calculation of constraint values, calculation of objective function values……………...19

3. Formulation of the mathematical model of the problem in terms of the cells of the MS Excel TP worksheet……………………………………………………..20

4. Search optimal solution task by means of the “Search for a Solution” add-in…………………………………………………..20

5. Analysis of the results………………………………………………………….21

b) The task of optimizing the transportation plan (transport task) ... 23

1. Mathematical statement of the problem……………………………………..23

2. Placement of data on the MS Excel TP worksheet …………………...24

3. Statement of the problem in terms of the worker Excel sheet to use the "Search for a Solution" utility….……………………………25

4. Analysis of the results………………………………………………………….26

List of used literature………………………………………..28

Task 1. Using computer technology, conduct a correlation and regression analysis of the studied economic indicators and build a regression model.

As a research tool, use:



Add-on Tools TP Analysis Package MS Excel;

Built-in functions of the Stats (Statistics) CKM Maple library.

Task 1 conditions:

Based on sample data, investigate the influence of factors X1, X2 and X3 on the effective attribute Y.

Build a correlation field and make an assumption about the presence and type of relationship between the studied factors;

Having assessed the closeness of the relationship between the factors under study, construct a multifactorial (single-factor) linear regression model of the form Y=f(X1,X2 X3) or the form Y=f(X).

Estimate:

The adequacy of the regression equation by the value of the coefficient of determinism R 2 ;

The significance of the coefficients of the regression equation according to Student's t-test at a given level of confidence probability p=0.05;

The degree of randomness of the relationship between each factor X and sign Y (Fisher's criterion);

The relationship between indicators X 1, X 2, X 3 of fixed assets and the volume of gross output of an enterprise in one of the industries is characterized by the following data:

Option 5

x1 1.5 2.6 3.5 4.8 5.9 6.3 7.2 8.9 9.5 11.1 15.0
x2 10.2 15.3 18.4 20.5 24.7 25.6 27.3 28.3 29.6 30.1 31.0
x3 1.1 2.3 3.5 4.1 5.7 6.6 7.3 8.5 9.8 10.1 12.0
Y

Problem solving 1.

The solution of task 1 assumes.

1. Construction of the correlation field.

2. Construction of a matrix of pair correlation coefficients.

3. Construction and analysis of single-factor regression models of linear and exponential type using the built-in functions of MS Excel TP.

4. Construction of linear one-factor regression models using the add-in "Analysis Package".

5. Conclusions.

Construction of the correlation field.

Let's place the table with the source data in cells A3:D15 of the Excel worksheet.

Application1.1
Y X1 X2 X3
1,5 10,2 1,1
2,6 15,3 2,3
3,5 18,4 3,5
4,8 20,5 4,1
5,9 24,7 5,7
6,3 25,6 6,6
7,2 27,3 7,3
8,9 28,3 8,5
9,5 29,6 9,8
11,1 30,1 10,1
?

Using the capabilities of the TP diagram wizard MS Excel, we will build a correlation field, that is, we will graphically represent the relationship between the resulting feature Y and each of the factors X. The graphs show that between the resulting feature Y and each of the factors X there is a direct proportional dependence approaching linear.

.

.

We investigate the closeness and nature of the relationship between the factors.

Construction of a matrix of pair correlation coefficients.

Using the "Analysis Package" add-in of the MS Excel TP (Service - Data Analysis - Correlation), we will build a matrix of pair correlation coefficients. The window of the "Correlation" tool is shown in Figure 1. The matrix of pair correlation coefficients is shown in Figure 2.

Fig.1. – Correlation window

Fig.2. – Matrix of pair correlation coefficients.

It can be seen from this matrix that all considered factors X1 - X3 have a close relationship with the effective feature Y. In addition, all factors X are multicollinear with each other. Therefore, the construction of a multifactorial model of the form Y=f(X1,X2,X3) is impossible.

1. Calculate the matrix of paired correlation coefficients; analyze the tightness and direction of the relationship of the resulting feature Y with each of the factors. X; evaluate the statistical significance of the correlation coefficients r(Y,X i); choose the most informative factor.

2. Build a paired regression model with the most informative factor; give an economic interpretation of the regression coefficient.

3. Evaluate the quality of the model using the average relative error of approximation, the coefficient of determination and F - Fisher's criterion (take the significance level α = 0.05).

4. With a confidence probability γ=80% to predict the average value of the indicator Y(forecast values ​​of the factors are given in Appendix 6). Present graphically actual and model values Y, prediction results.

5. Using the inclusion method, build two-factor models, keeping the most informative factor in them; build a three-factor model with complete list factors.

6. Choose the best of the built multiple models. Give an economic interpretation of its coefficients.

7. Check the significance of the coefficients multiple regression via t–Student's test (accept significance level α=0.05). Has the quality of the multiple model improved compared to the pair model?

8. Assess the influence of factors on the result using elasticity coefficients, beta and delta coefficients.

Task 2. Modeling a one-dimensional time series

Appendix 7 shows the time series Y(t) socio-economic indicators for the Altai Territory for the period from 2000 to 2011. It is required to study the dynamics of the indicator corresponding to the task variant.

Option Designation, name, unit of measurement of the indicator
Y1 Average consumer spending per capita (per month), rub.
Y2 Emissions of pollutants into the atmospheric air, thousand tons
Y3 Average prices in the secondary housing market (at the end of the year, for square meter total area), rub
Y4 Volume paid services per capita, rub
Y5 Average annual number of people employed in the economy, thousand people
Y6 Number of own cars per 1000 people (at the end of the year), pieces
Y7 Average per capita cash income(per month), rub
Y8 Consumer price index (December to December of the previous year), %
Y9 Investments in fixed assets (in actual prices), million rubles
Y10 turnover retail per capita (in actual prices), rub


Work order

1. Build a linear model of the time series, the parameters of which are estimated by the least squares. Explain the meaning of the regression coefficient.

2. Assess the adequacy of the constructed model using the properties of randomness, independence, and the correspondence of the residual component to the normal distribution law.

3. Evaluate the accuracy of the model based on the use of the average relative approximation error.

4. Forecast the indicator under consideration for a year ahead (calculate the forecast interval with a confidence level of 70%).

5. Present graphically the actual values ​​of the indicator, the results of modeling and forecasting.

6. Calculate the parameters of the logarithmic, polynomial (polynomial of the 2nd degree), power, exponential and hyperbolic trends. Based graphic image and the values ​​of the determination index to choose the most suitable look trend.

7. With the help of the best non-linear model, carry out point forecasting of the considered indicator for the year ahead. Compare the result obtained with the predictive confidence interval built using the linear model.

EXAMPLE

Accomplishments control work

Task 1

The company sells used cars. The names of indicators and initial data for econometric modeling are presented in the table:

Realization price, thousand c.u. ( Y) The price of a new car, thousand c.u. ( X1) Service life, years ( X2) Left hand drive - 1, right hand drive - 0, ( X3)
8,33 13,99 3,8
10,40 19,05 2,4
10,60 17,36 4,5
16,58 25,00 3,5
20,94 25,45 3,0
19,13 31,81 3,5
13,88 22,53 3,0
8,80 16,24 5,0
13,89 16,54 2,0
11,03 19,04 4,5
14,88 22,61 4,6
20,43 27,56 4,0
14,80 22,51 3,3
26,05 31,75 2,3

Required:

1. Calculate the matrix of paired correlation coefficients; analyze the tightness and direction of the relationship of the resulting feature Y with each of the factors X; evaluate the statistical significance of the correlation coefficients r(Y, X i); choose the most informative factor.

Using Excel (Data / Data Analysis / CORRELATION):

Let's get a matrix of pair correlation coefficients between all available variables:

At X1 X2 X3
At
X1 0,910987
X2 -0,4156 -0,2603
X3 0,190785 0,221927 -0,30308

Let's analyze the correlation coefficients between the resulting feature Y and each of the factors X j:

> 0, therefore, between variables Y and X 1 there is a direct correlation: the higher the price of a new car, the higher the selling price.

> 0.7 - this dependence is close.

< 0, значит, между переменными Y and X 2 observed

inverse correlation: the selling price is lower for auto-

mobile phones with a long service life.

– this dependence is moderate, closer to weak.

> 0, so between variables Y and X 3 shows a direct correlation: the selling price is higher for left-hand drive cars.

< 0,4 – эта зависимость слабая.

To check the significance of the found correlation coefficients, we use Student's test.

For each correlation coefficient compute t-statistics by formula and enter the calculation results in an additional column of the correlation table:

At X1 X2 X3 t-statistics
At
X1 0,910987 7,651524603
X2 -0,4156 -0,2603 1,582847988
X3 0,190785 0,221927 -0,30308 0,673265587

According to the table of critical points of Student's distribution at the level of significance and the number of degrees of freedom we define critical value(Appendix 1, or STEUDRESPOINT function).Y and service life X 2 is reliable.

< , следовательно, коэффициент не является значимым. На основании выборочных данных нет оснований утверждать, что зависимость между ценой реализации Y and steering wheel position X 3 is reliable.

Thus, the closest and most significant relationship is observed between the selling price Y and the price of a new car X one ; factor X 1 is the most informative.

Matrix of pairwise correlation coefficients

Y X1 X2 X3 X4 X5
Y
X1 0,732705
X2 0,785156 0,706287
X3 0,179211 -0,29849 0,208514
X4 0,667343 0,924333 0,70069 0,299583
X5 0,709204 0,940488 0,691809 0,326602 0,992945

The nodes of the matrix contain paired correlation coefficients that characterize the closeness of the relationship between factor characteristics. Analyzing these coefficients, we note that the greater their absolute value, the greater the impact of the corresponding factor sign on the resultant one. The analysis of the resulting matrix is ​​carried out in two stages:

1. If the first column of the matrix contains correlation coefficients for which /r /< 0,5, то соответствующие признаки из модели исключаются. В данном случае в первом столбце матрицы коэффициентов корреляции исключается фактор или коэффициент роста уровня инфляции. Данный фактор оказывает меньшее влияние на результативный признак, нежели оставшиеся четыре признака.

2. Analyzing the pair correlation coefficients of factor characteristics with each other, (r XiXj), characterizing the tightness of their relationship, it is necessary to evaluate their independence from each other, since this necessary condition for further regression analysis. In view of the fact that there are no absolutely independent signs in the economy, it is necessary to single out, if possible, the most independent ones. Factor signs that are in close correlation with each other are called multicollinear. The inclusion of multicollinear features in the model makes it impossible to economically interpret the regression model, since a change in one factor entails a change in the factors associated with it, which can lead to a “breakdown” of the model as a whole.

The criterion for multicolleniality of factors is as follows:

/r XiXj / > 0.8

In the resulting matrix of paired correlation coefficients, this criterion is met by two indicators located at the intersection of the rows and . Of each pair of these features, one should be left in the model, it should have a greater impact on the resulting feature. As a result, the factors and are excluded from the model; cost growth rate products sold and the growth rate of the volume of its implementation.

So, we introduce factors X1 and X2 into the regression model.

Next, regression analysis is carried out (service, data analysis, regression). Again compiles a table of initial data with factors X1 and X2. Regression as a whole is used to analyze the impact on a separate dependent variable of the values ​​of independent variables (factors) and allows the correlation between features to be presented in the form of some functional dependence called a regression equation or a correlation-regression model.

As a result of regression analysis, we obtain the results of calculating multivariate regression. Let's analyze the obtained results.

All regression coefficients are significant according to Student's t-test. The multiple correlation coefficient R was 0.925, the square of this value (determination coefficient) means that the variation of the resulting trait is on average 85.5% due to the variation of the factor traits included in the model. The coefficient of determinism characterizes the tightness of the relationship between the set of factor characteristics and the performance indicator. How closer meaning R-squared to 1, the closer the relationship. In our case, an indicator of 0.855 indicates correct selection factors and the presence of the relationship of factors with the performance indicator.

The considered model is adequate, since the calculated value of the Fisher F-criterion significantly exceeds its table value(F obl = 52.401; F tabl = 1.53).

As a general result of the conducted correlation-regression analysis, there is a multiple regression equation, which has the form:

The resulting regression equation meets the purpose of correlation and regression analysis and is a linear model of the dependence of the company's balance sheet profit on two factors: the growth rate of labor productivity and the coefficient of industrial property.

Based on the obtained model, we can conclude that with an increase in the level of labor productivity by 1% compared to the previous period, the balance sheet profit will increase by 0.95 percentage points; an increase in the coefficient of industrial property by 1% will lead to an increase in the effective indicator by 27.9 percentage points. Consequently, the dominant influence on the growth of balance sheet profit is the increase in the value of industrial property (renovation and growth of fixed assets of the enterprise).

According to the multiple regression model, a multifactorial forecast of the effective feature is performed. Let it be known that X1 = 3.0 and X3 = 0.7. Let's substitute the values ​​of factor signs into the model, we get Cm = 0.95 * 3.0 + 27.9 * 0.7 - 19.4 = 2.98. Thus, with an increase in labor productivity and modernization of fixed assets at the enterprise, the balance sheet profit in the 1st quarter of 2005 relative to the previous period (4th quarter of 2004) will increase by 2.98%.


Y X 1 X 2 X 3 X 4 X 5 X 6
Y
X 1 0,519
X 2 -0,273 0,030
X 3 0,610 0,813 -0,116
X 4 -0,572 -0,013 -0,022 -0,091
X 5 0,297 0,043 -0,461 0,120 -0,359
X 6 0,118 -0,366 -0,061 -0,329 -0,100 -0,290

Analysis interfactorial(between the “xes”!) of the correlation coefficients shows that the value of 0.8 exceeds in absolute value only the correlation coefficient between a pair of factors X 1 –X 3 (highlighted in bold). Factors X 1 –X 3 are thus recognized as collinear.

2. As shown in paragraph 1, the factors X 1 –X 3 are collinear, meaning that they are effectively duplicates of each other, and including them in the model at the same time will misinterpret the respective regression coefficients. It can be seen that the factor X 3 has a larger modulo correlation coefficient with the result Y than the factor X 1: r y , x 1 =0,519; r y , x 3=0.610; (cm. tab. one). This indicates a stronger influence of the factor X 3 to change Y. Factor X 1 is thus excluded from consideration.

To construct the regression equation, the values ​​of the variables used ( Y,X 2 , X 3 , X 4 , X 5 , X 6) copy to a blank worksheet ( adj. 3). We build the regression equation using the add-on " Data Analysis… Regression» (menu « Service"® « Data analysis…» ® « Regression"). The regression analysis panel with filled fields is shown in rice. 2.

The results of the regression analysis are given in adj. 4 and transferred to tab. 2. The regression equation has the form (see " Odds» in tab. 2):

The regression equation is recognized as statistically significant, since the probability of its random formation in the form in which it was obtained is 8.80 × 10 -6 (see Fig. "F Significance" in tab. 2), which is significantly lower than the accepted significance level a=0.05.

X 3 , X 4 , X 6 below the accepted significance level a=0.05 (see “ P-Value" in tab. 2), which indicates statistical significance coefficients and the significant impact of these factors on the change in annual profit Y.

Probability of random formation of coefficients at factors X 2 and X 5 exceeds the accepted significance level a=0.05 (see “ P-Value" in tab. 2), and these coefficients are not considered statistically significant.

rice. 2. Model regression analysis panel Y(X 2 , X 3 , X 4 , X 5 , X 6)

table 2

Y(X 2 , X 3 , X 4 , X 5 , X 6)

Regression statistics
Multiple R 0,868
R-square 0,753
Normalized R-square 0,694
standard error 242,3
Observations
Analysis of variance
df SS MS F Significance F
Regression 3749838,2 749967,6 12,78 8.80E-06
Remainder 1232466,8 58688,9
Total 4982305,0
Regression Equation
Odds standard error t-statistic P-Value
Y-intersection 487,5 641,4 0,760 0,456
X2 -0,0456 0,0373 -1,224 0,235
X3 0,1043 0,0194 5,375 0,00002
X4 -0,0965 0,0263 -3,674 0,001
X5 2,528 6,323 0,400 0,693
X6 248,2 113,0 2,197 0,039

3. Based on the results of checking the statistical significance of the coefficients of the regression equation, carried out in the previous paragraph, we build a new regression model containing only informative factors, which include:

factors, the coefficients for which are statistically significant;

factors whose coefficients t‑statistics modulo exceeds one (in other words, the absolute value of the coefficient is greater than its standard error).

The first group includes factors X 3 , X 4 , X 6 , to the second - factor X 2. Factor X 5 is excluded from consideration as uninformative, and the final regression model will contain the factors X 2 , X 3 , X 4 , X 6 .

To build a regression equation, copy the values ​​of the variables used to a blank worksheet ( adj. five) and perform a regression analysis ( rice. 3). Its results are given in adj. 6 and transferred to tab. 3. The regression equation looks like:

(cm. " Odds» in tab. 3).

rice. 3. Panel regression analysis of the model Y(X 2 , X 3 , X 4 , X 6)

Table 3

Model Regression Analysis Results Y(X 2 , X 3 , X 4 , X 6)

Regression statistics
Multiple R 0,866
R-square 0,751
Normalized R-square 0,705
standard error 237,6
Observations
Analysis of variance
df SS MS F Significance F
Regression 3740456,2 935114,1 16,57 2.14E-06
Remainder 1241848,7 56447,7
Total 4982305,0
Regression Equation
Odds standard error t-statistic P-Value
Y-intersection 712,2 303,0 2,351 0,028
X2 -0,0541 0,0300 -1,806 0,085
X3 0,1032 0,0188 5,476 0,00002
X4 -0,1017 0,0223 -4,560 0,00015
X6 227,5 98,5 2,310 0,031

The regression equation is statistically significant: the probability of its random formation is below the acceptable significance level a=0.05 (see " Significance F" in tab. 3).

Statistically significant are also coefficients for factors X 3 , X 4 , X 6: the probability of their random formation is below the acceptable significance level a=0.05 (see " P-Value" in tab. 3). This indicates a significant impact of the annual size of insurance fees. X 3 , annual insurance payments X 4 and forms of ownership X 6 per change in annual profit Y.

Coefficient at the factor X 2 (annual amount of insurance reserves) is not statistically significant. However, this factor can still be considered informative, since t-statistics of its ratio exceeds modulo unit, although further conclusions regarding the factor X 2 should be treated with some caution.

4. Evaluate the quality and accuracy of the last regression equation using some statistical characteristics obtained during the regression analysis (see . « regression statistics" in tab. 3):

multiple coefficient of determination

shows that the regression model explains 75.1% of the annual profit variation Y, and this variation is due to a change in the factors included in the regression model X 2 , X 3 , X 4 and X 6 ;

regression standard error

thousand roubles.

shows that the values ​​of annual profit predicted by the regression equation Y differ from the actual values ​​by an average of 237.6 thousand rubles.

The average relative approximation error is determined by the approximate formula:

where thousand roubles. - the average value of the annual profit (determined using the built-in function " AVERAGE»; adj. one).

E rel shows that the values ​​of annual profit predicted by the regression equation Y differ from the actual values ​​by an average of 26.7%. The model has unsatisfactory accuracy (at - the accuracy of the model is high, at - good, with - satisfactory, at - unsatisfactory).

5. For the economic interpretation of the coefficients of the regression equation, we tabulate the average values ​​and standard deviations variables in the original data ( tab. 4) . The mean values ​​were determined using the built-in function " AVERAGE”, standard deviations - using the built-in function “ STDEV" (cm. adj. one).

Matrix of pairwise correlation coefficients is a matrix whose elements are paired correlation coefficients. For example, for three variables, this matrix looks like:
- yx 1x2x 3
y1 r yx1r yx2r yx3
x 1rx1y1 rx1x2rx1x3
x2rx2yrx2x11 rx2x3
x 3rx3yrx3x1rx3x21

Insert the matrix of paired coefficients into the box.

Example. Based on the data of 154 agricultural enterprises of the Kemerovo region in 2003, to study the efficiency of grain production (Table 13).

  1. Determine the factors that form the profitability of grain in agricultural enterprises in 2003.
  2. Plot a matrix of pairwise correlation coefficients. Determine which factors are multicollinear.
  3. Build a regression equation that characterizes the dependence of the profitability of cereals on all factors.
  4. Assess the significance of the resulting regression equation. What factors significantly affect the formation of profitability of grain crops in this model?
  5. Assess the value of the profitability of grain production in agricultural enterprise No. 3.

Decision we get using the calculator Multiple regression equation:

1. Estimation of the regression equation.
Let us define the vector of estimates of the regression coefficients. According to the method least squares, the vector is obtained from the expression:
s = (X T X) -1 X T Y
Matrix X

1 0.43 2.02 0.29
1 0.87 1.29 0.55
1 1.01 1.09 0.7
1 0.63 1.68 0.41
1 0.52 0.3 0.37
1 0.44 1.98 0.3
1 1.52 0.87 1.03
1 2.19 0.8 1.3
1 1.8 0.81 1.17
1 1.57 0.84 1.06
1 0.94 1.16 0.64
1 0.72 1.52 0.44
1 0.73 1.47 0.46
1 0.77 1.41 0.49
1 1.21 0.97 0.88
1 1.25 0.93 0.91
1 1.31 0.91 0.94
1 0.38 2.08 0.27
1 0.41 2.05 0.28
1 0.48 1.9 0.32
1 0.58 1.73 0.38
1 0 0 0

Matrix Y
0.22
0.67
0.79
0.42
0.32
0.24
0.95
1.05
0.99
0.96
0.73
0.52
2.1
0.58
0.87
0.89
0.91
0.14
0.18
0.27
0.37
0

XT Matrix
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0.43 0.87 1.01 0.63 0.52 0.44 1.52 2.19 1.8 1.57 0.94 0.72 0.73 0.77 1.21 1.25 1.31 0.38 0.41 0.48 0.58 0
2.02 1.29 1.09 1.68 0.3 1.98 0.87 0.8 0.81 0.84 1.16 1.52 1.47 1.41 0.97 0.93 0.91 2.08 2.05 1.9 1.73 0
0.29 0.55 0.7 0.41 0.37 0.3 1.03 1.3 1.17 1.06 0.64 0.44 0.46 0.49 0.88 0.91 0.94 0.27 0.28 0.32 0.38 0

Multiply matrices, (X T X)
Find the determinant det(X T X) T = 34.35
We find inverse matrix(X T X) -1
0.6821 0.3795 -0.2934 -1.0118
0.3795 9.4402 -0.133 -14.4949
-0.2934 -0.133 0.1746 0.3204
-1.0118 -14.4949 0.3204 22.7272

The vector of estimates of the regression coefficients is equal to
s = (X T X) -1 X T Y =
0.1565
0.3375
0.0043
0.2986

Regression Equation (evaluation of the regression equation)
Y = 0.1565 + 0.3375X 1 + 0.0043X 2 + 0.2986X 3

Matrix of pairwise correlation coefficients

The number of observations is n = 22. The number of independent variables in the model is exactly 3, and the number of regressors, taking into account the unit vector, is equal to the number of unknown coefficients. Taking into account the sign Y, the dimension of the matrix becomes equal to 5. The matrix of independent variables X has the dimension (22 x 5). The matrix X T X is determined by direct multiplication or by the following pre-calculated sums.
Matrix composed of Y and X
1 0.22 0.43 2.02 0.29
1 0.67 0.87 1.29 0.55
1 0.79 1.01 1.09 0.7
1 0.42 0.63 1.68 0.41
1 0.32 0.52 0.3 0.37
1 0.24 0.44 1.98 0.3
1 0.95 1.52 0.87 1.03
1 1.05 2.19 0.8 1.3
1 0.99 1.8 0.81 1.17
1 0.96 1.57 0.84 1.06
1 0.73 0.94 1.16 0.64
1 0.52 0.72 1.52 0.44
1 2.1 0.73 1.47 0.46
1 0.58 0.77 1.41 0.49
1 0.87 1.21 0.97 0.88
1 0.89 1.25 0.93 0.91
1 0.91 1.31 0.91 0.94
1 0.14 0.38 2.08 0.27
1 0.18 0.41 2.05 0.28
1 0.27 0.48 1.9 0.32
1 0.37 0.58 1.73 0.38
1 0 0 0 0

The transposed matrix.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0.22 0.67 0.79 0.42 0.32 0.24 0.95 1.05 0.99 0.96 0.73 0.52 2.1 0.58 0.87 0.89 0.91 0.14 0.18 0.27 0.37 0
0.43 0.87 1.01 0.63 0.52 0.44 1.52 2.19 1.8 1.57 0.94 0.72 0.73 0.77 1.21 1.25 1.31 0.38 0.41 0.48 0.58 0
2.02 1.29 1.09 1.68 0.3 1.98 0.87 0.8 0.81 0.84 1.16 1.52 1.47 1.41 0.97 0.93 0.91 2.08 2.05 1.9 1.73 0
0.29 0.55 0.7 0.41 0.37 0.3 1.03 1.3 1.17 1.06 0.64 0.44 0.46 0.49 0.88 0.91 0.94 0.27 0.28 0.32 0.38 0

A T A matrix.
22 14.17 19.76 27.81 13.19
14.17 13.55 15.91 16.58 10.56
19.76 15.91 23.78 22.45 15.73
27.81 16.58 22.45 42.09 14.96
13.19 10.56 15.73 14.96 10.45

The resulting matrix has the following correspondence:

Let's find the paired correlation coefficients.
For y and x 1

Averages



Dispersion





Correlation coefficient

For y and x 2
The equation is y = ax + b
Averages



Dispersion


standard deviation


Correlation coefficient

For y and x 3
The equation is y = ax + b
Averages



Dispersion


standard deviation


Correlation coefficient

For x 1 and x 2
The equation is y = ax + b
Averages



Dispersion


standard deviation


Correlation coefficient

For x 1 and x 3
The equation is y = ax + b
Averages



Dispersion


standard deviation


Correlation coefficient

For x 2 and x 3
The equation is y = ax + b
Averages



Dispersion


standard deviation


Correlation coefficient

Matrix of paired correlation coefficients.
- y x 1 x2 x 3
y 1 0.62 -0.24 0.61
x 1 0.62 1 -0.39 0.99
x2 -0.24 -0.39 1 -0.41
x 3 0.61 0.99 -0.41 1

Analysis of the first row of this matrix allows selection of factor features that can be included in the multiple correlation model. Factor signs with r yxi< 0.5 исключают из модели.
Collinearity is the relationship between factors. Compliance with the following inequalities can be taken as a criterion for multicollinearity:
r(x j y) > r(x k x j) ; r(x k y) > r(x k x j).
If one of the inequalities is not observed, then the parameter x k or x j is excluded, the relationship of which with the effective indicator Y is the least close.
3. Analysis of the parameters of the regression equation.
Let's move on to statistical analysis of the resulting regression equation: checking the significance of the equation and its coefficients, studying absolute and relative approximation errors
For an unbiased estimate of the variance, we perform the following calculations:
Unbiased error e = Y - X*s ( absolute error approximations)
-0.18
0.05
0.08
-0.08
-0.12
-0.16
-0.03
-0.24
-0.13
-0.05
0.06
-0.02
1.55
0.01
0.04
0.04
0.03
-0.23
-0.21
-0.15
-0.1
-0.16

s e 2 = (Y - X*s) T (Y - X*s)
The unbiased estimate of the variance is

The estimate of the standard deviation is

Find an estimate of the covariance matrix of the vector k = a*(X T X) -1
0.26 0.15 -0.11 -0.39
0.15 3.66 -0.05 -5.61
-0.11 -0.05 0.07 0.12
-0.39 -5.61 0.12 8.8

The dispersions of the model parameters are determined by the relation S 2 i = K ii , i.e. are the elements on the main diagonal
In order to expand the possibilities of a meaningful analysis of the regression model, partial elasticity coefficients are used, which are determined by the formula:


Partial coefficient of elasticity E 1< 1. Следовательно, его влияние на результативный признак Y незначительно.

Partial coefficient of elasticity E 2< 1. Следовательно, его влияние на результативный признак Y незначительно.

Partial coefficient of elasticity E 3< 1. Следовательно, его влияние на результативный признак Y незначительно.
The closeness of the joint influence of factors on the result is estimated by the multiple correlation index (from 0 to 1)

The relationship between trait Y factors X is moderate
Determination coefficient
R 2 = 0.62 2 = 0.38
those. in 38.0855% of cases, changes in x lead to a change in y. In other words, the accuracy of the selection of the regression equation is average
Significance of the correlation coefficient

According to the Student's table, we find Ttable
T table (n-m-1; a) = (18; 0.05) = 1.734
Since Tobs > Ttabl, we reject the hypothesis that the correlation coefficient is equal to 0. In other words, the correlation coefficient is statistically significant
Interval estimate for correlation coefficient (confidence interval)

Confidence interval for correlation coefficient
r(0.3882;0.846)
5. Testing hypotheses regarding the coefficients of the regression equation (testing the significance of the parameters of the multiple regression equation).
1) t-statistic


The statistical significance of the regression coefficient b 0 is not confirmed

The statistical significance of the regression coefficient b 1 is not confirmed

The statistical significance of the regression coefficient b 2 is not confirmed

The statistical significance of the regression coefficient b 3 is not confirmed
Confidence interval for coefficients of the regression equation
Let's define confidence intervals regression coefficients, which with 95% reliability will be as follows:
(b i - t i S i ; b i + t i S i)
b 0: (-0.7348;1.0478)
b 1: (-2.9781;3.6531)
b2: (-0.4466;0.4553)
b 3: (-4.8459;5.4431)

2) F-statistics. Fisher's criterion


fkp = 2.93
Because F< Fkp, то коэффициент детерминации статистически не значим и уравнение регрессии статистически ненадежно.
6. Check for the presence of heteroscedasticity by graphical analysis of residuals.
In this case, the values ​​of the explanatory variable X i are plotted along the abscissa, and the squares of the deviation e i 2 are plotted along the ordinate.

y y(x) e=y-y(x) e 2
0.22 0.4 -0.18 0.03
0.67 0.62 0.05 0
0.79 0.71 0.08 0.01
0.42 0.5 -0.08 0.01
0.32 0.44 -0.12 0.02
0.24 0.4 -0.16 0.03
0.95 0.98 -0.03 0
1.05 1.29 -0.24 0.06
0.99 1.12 -0.13 0.02
0.96 1.01 -0.05 0
0.73 0.67 0.06 0
0.52 0.54 -0.02 0
2.1 0.55 1.55 2.41
0.58 0.57 0.01 0
0.87 0.83 0.04 0
0.89 0.85 0.04 0
0.91 0.88 0.03 0
0.14 0.37 -0.23 0.05
0.18 0.39 -0.21 0.04
0.27 0.42 -0.15 0.02
0.37 0.47 -0.1 0.01
0.16 -0.16 0.02