Analytical smoothing of a temporary series. Trend equation

According to the formula (9.29), the parameters of the linear trend are equal a \u003d.1894/11 \u003d 172.2 c / ha; b.\u003d 486/110 \u003d 4,418 c / ha. The linear trend equation has the form:

w. = 172,2 + 4,418t.where t \u003d. 0 In 1987, this means that the average actual and leveling level referred to the middle of the period, i.e. By 1991, equal to 172 c 1 Ra A average annual increase is 4,418 c / ha per year.

The parameters of the parabolic trend according to (9.23) are equal b \u003d. 4,418; a. = 177,75; c \u003d. -0.5571. Parabolic trend equation has the view y \u003d. 177,75 + 4,418t. - 0.5571t 2; T. \u003d 0 In 1991, this means that the absolute increase in yield slows down on average by 2 · 0.56 c / ha per year per year. The absolute increase itself is no longer a constant of a parabolic trend, but is an average value for the period. A year adopted for the beginning of the reference ie 1991, the trend passes through a point with an ordinate 77.75 c / ha; The free member of the parabolic trend is not an average level for the period. The parameters of the exponential trend are calculated by formulas (9.32) and (9.33) LN but \u003d 56,5658 / 11 \u003d 5,1423; Potentiation, get but \u003d 171.1; LN. k. \u003d 2,853: 110 \u003d 0.025936; Potentiation, get k. = 1,02628.

The exponential trend equation has the form: y̅ \u003d.171.1 · 1,02628. t.

This means that the average annual rate of yield rate for the period was 102.63%. At the point, the start of the reference, the trend passes the point with the ordinate 171.1 c / ha.

The levels calculated by the equations of trends are recorded in the last three columns table. 9.5. As can be seen according to this data. The estimated levels of levels in all three types of trends differ slightly, as the acceleration of the parabola, and the growth rate of the exponent is small. A significant difference has a parabola - the level growth since 1995 is terminated, while with a linear trend, the levels grow further, and when exponentially, their OST is accelerated. Therefore, for forecasts for the future, these three trends are non-equivance: with extrapolation of Parabolas for future years, the levels will be dramatically dispersed with a direct and exponent, which is seen from the table. 9.6. This short-standing printing solution on the PC software under the Statgraphics program of the same three trends. The difference between their free members from the above is explained by the fact that the program numbers not from the middle, but from the beginning, so that the free members of the trends refer to 1986, for which T \u003d 0. The exponent equation on the printout is left in logarithm. The forecast is 5 years ahead, i.e. Until 2001. With the change in the origin of the coordinates (time counting) in the parabola equation, the average absolute increase is changed, the parameter b. Since as a result of a negative acceleration, the increase is reduced all the time, and its maximum is at the beginning of the period. The Parabola constant is only acceleration.


In the "DATA" line, the levels of the original series are given; "Forecast Summary" means summary data for the forecast. In the following lines - equations are straight, parabolas, exponents - in logarithmic form. Count ME means the average discrepancy between the levels of the source row and the trend levels (level). For a direct and parabola, this discrepancy is always zero. Exponent levels on average by 0.48852 below the levels of the source row. The exact coincidence is possible if the true trend is the exhibitor; In this case, there is no coincidence, but the difference is not enough. Count Mae is a dispersion s 2 - The measure of the extent of the actual levels relative to the trend, as stated in paragraph 9.7. Count MAE - the average linear deflection of the levels from the trend in the module (see paragraph 5.8); Count Mare - relative linear deviation in percent. Here they are given as indicators of the fitness of the selected trend type. A smaller dispersion and a deviation module has a parabola: it for the period 1986-1996. Closer to actual levels. But the choice of the type of trend cannot be reduced only to this criterion. In fact, the growth of the increase is the result of a large negative deflection, i.e., fault in 1996

The second half of the table is the forecast of the levels of yield in three types of trends for years; T \u003d 12, 13, 14, 15 and 16 from the beginning of reference (1986). Predicted levels of exponentials up to the 16th year of the slightly higher ,. why in a straight line. Trend-parabola levels - decrease, increasingly disagreeable with other trends.

As can be seen in Table. 9.4, when calculating the trend parameters, the levels of the source row are included with different weights - values t P. and their squares. Therefore, the effect of level fluctuations on the trend parameters depends on which number of the year there is a yield or a faulty year. If a sharp deviation falls for a year with a zero number ( t i \u003d 0), It does not affect the trend parameters, and if it comes to the beginning and the end of the row, it will affect strongly. Therefore, one-time analytical alignment incompletely frees the trend parameters from the influence of the oscillation, and with strong oscillations they can be very distorted that in our example it happened with parabola. To further eliminate the distorting effect of oscillations on the trend parameters should be applied method of multiple sliding alignment.

This technique is that the trend parameters are not calculated immediately throughout the row, but sliding methodfirst for the first t.time periods or moments, then for the period from the 2nd to t +. 1, from the 3rd to (t +. 2) -go level, etc. If the number of initial levels of the row is equal p, and the length of each sliding base of the parameter calculation is equal to t, The number of such sliding bases T or individual values \u200b\u200bof the parameters that will be determined according to them will be:

L. = p + 1 - t.

The application of the methodology of the sliding multiple alignment is considered, as can be seen from the above calculations, it is possible only with enough big number Row levels are usually 15 or more. Consider this technique on the example of the data Table. 9.4 - Dynamics of prices for non-fuel goods of developing countries, which again gives the reader to participate in a small scientific research. On the same example, we continue and forecasting methods in section 9.10.

If you calculate in our number of parameters for 11 years of periods (at 11 levels), then t. \u003d 17 + 1 - 11 \u003d 7. The meaning of repeated sliding alignment is that with consecutive shifts of the base calculation of parameters at the ends and in the middle will turn out different levels With different signs and the magnitude of the deviations from the trend. Therefore, when the base shifts, the parameters will be inadened with others, during the subsequent averaging of the parameter values \u200b\u200bfor all shifts of the calculation base, there will be a further mutual risk of distortion of trend parameters by fluctuations.

Multiple moving alignment not only allows to obtain a more accurate and reliable estimate of the trend parameters, but also to control the correctness of the choice of the type of the trend equation. If it turns out that the master parameter of the trend, its constant when calculating the sliding bases does not randomly fluctuate, and systematically changes its value significantly, it means that the type of trend has been chosen incorrectly, this parameter Constant is not.

As for the free member in multiple alignment, it is not necessary and, moreover, it is simply incorrectly calculated its value as an average for all the shifts of the base, because, with such a method, the individual levels of the initial series would be calculated in the calculation of the average with different weights, and the amount of level levels differed would with the sum of the members of the original series. The free member of the trend is the average level of level for the period, subject to time countdown from the middle of the period. When counting from the beginning, if the first level t I. \u003d 1, free member will be equal to: a 0 \u003d U̅ - B((N - 1) / 2). It is recommended that the length of the sliding base of calculating the trend parameters to choose at least 9-11 levels to sufficiently repay the level fluctuations. If the initial number is very long, the base can be up to 0.7 - 0.8 of its length. To eliminate the effect of long-periodic (cyclic) oscillations on the trend parameters, the number of shifts of the base should be equal to or more times the length of the oscillation cycle. Then the beginning and end of the base will continue to "run" all the cycle phases and in averaging the parameter on all shifts of its distortion from cyclic oscillations will be mutually ridiculed. Another way is to take the length of the sliding base equal to the length of the cycle so that the start of the base and the end of the base always accounted for the same phase of the oscillation cycle.

Since according to Table. 9.4, it has already been established that the trend has a linear form, we carry out the calculation of the average annual absolute increase, i.e. the parameters b. The equations of a linear trend with a moving method according to 11-year bases (see Table 9.7). It also provides the calculation of the data required for the subsequent study of the amount in paragraph 9.7. Let us dwell on the method of multiple alignment by sliding bases. Calculate the parameter b. For all databases:


Row. Trend equation.

Growth curves describing the patterns of development of phenomena in time are the result of analytical alignment of dynamic series. Alignment of a series with certain functions (i.e., their adjustment to the data) in most cases turns out to be a convenient means of describing empirical data. This tool, when complying with a number of conditions, can be applied for forecasting. The leveling process consists of the following main stages:

Selection of the type of curve, the form of which corresponds to the nature of the change in the dynamic series;

Definitions of numerical values \u200b\u200b(estimation) of the parameters of the curve;

A posteriori quality control of the selected trend.

In modern PPP, all listed steps are implemented simultaneously, as a rule, within one procedure.

Analytical smoothing using a function or another allows you to get aligned, or, as they are sometimes not quite legitimately called, theoretical values \u200b\u200bof the levels of dynamic series, i.e. those levels that were observed if the dynamics of the phenomenon completely coincided with the curve. The same function with some adjustment or without it is used as a model for extrapolation (forecast).

The question of the choice of the type of curve is the main one during the alignment of the series. Under all other equal terms, the error in solving this issue is more significant in its consequences (especially for forecasting) than an error related to the statistical assessment of parameters.

Since the trend form objectively exists, then when it is detected, it should be proceeded from the material nature of the phenomenon studied, exploring internal reasons Its development, as well as external conditions and factors affecting it. Only after a deep meaningful analysis can be processed to the use of special techniques developed by statistics.

A graphic representation of the temporary series is a very common technique of detecting a trend shape. But at the same time the influence of the subjective factor is great, even when displaying aligned levels.

The most reliable methods for choosing the trend equation are based on the properties of various curves used in analytical alignment. This approach allows us to link the type of trend with those or other quality properties of the phenomenon. It seems to us that in most cases the method is practically acceptable, which is based on comparing the characteristics of changes in the growth of the studied dynamic series with corresponding characteristics Growth curves. This curve is chosen for alignment, the law of changes in the growth of which is closest to the pattern of change of actual data.

In tab. 4 provides a list of the most commonly used in the analysis of economic rows of curves types and the corresponding "symptoms" are indicated, which can be determined which type of curves is suitable for alignment.

When choosing a curve, one should keep in mind another circumstance. The increase in the complexity of the curve in a number of cases can really increase the accuracy of the description of the trend in the past, however, due to the fact that more complex curves contain a greater number of parameters and more high degree An independent variable, their confidence intervals will be in general significantly wider than that of simpler curves at the same period of the probe.

Table 4.

The nature of the change in indicators based
on medium increments for various types of curves

Indicator Character of change of indicators in time View of Krivoy
Approximately the same Straight
Linely change Parabola second degree
Linely change Parabola third degree
Approximately the same Exhibitor
Linely change Logarithmic parabola
Linely change Modified Exhibitor
Linely change Gome curve

Currently, when the use of special programs without much effort allows simultaneously to build several types of equations, formal statistical criteria are widely exploited to determine the best trend equation.

From the above, apparently, it can be concluded that the choice of the form of an alignment curve is a task that is not solved uniquely, but is reduced to obtaining a number of alternatives. The final choice cannot lie in the field of formal analysis, especially if it is assumed by alignment not only to statistically describe the pattern of level behavior in the past, but also extrapolate the pattern found in the future. At the same time, various statistical techniques for processing observation data can bring significant benefits at least with their help, it is possible to reject obviously unsuitable options and thereby significantly limit the selection field.

Consider the most used types of trend equations:

1. Linen trend shape:

where is the level of a number obtained by alignment in a straight line;

Initial level of trend;

Medium absolute increase; Trend Constant.

For the linear shape of the trend, the equality of the so-called first differences (absolute increases) and zero second differences, i.e. acceleration.

2. Parabletic (polynomial 2nd degree) Trend shape:

For this type of curve constant are the second difference (acceleration), and zero - third differences.

The parabolic shape of the trend corresponds to an accelerated or slow change in the levels of a row with a constant acceleration. If a< 0 и > 0, then the quadratic parabola has a maximum if\u003e 0 and< 0 – минимум. Для отыскания экстремума первую производную параболы по t приравнивают 0 и решают уравнение относительно t.

3. Exponential trend form:

where is the trend constant; The average rate of change in the level of the row.

At\u003e 1, this trend may reflect the tendency of accelerated and increasingly accelerating increasing row levels. For< 1 – тенденцию постоянно, все более замедляющегося снижения уровней временного ряда.

4.Gypebolic shape of the trend (type 1):

This trend form may display the trend of processes limited to the level limit.

5.Logrift trend form:

where is the trend constant.

The logarithmic trend may describe the trend that manifests itself in slowing the growth of the levels of a number of speakers in the absence of extremely possible value. With a sufficiently large t, the logarithmic curve becomes little distinctive from the straight line.

6. Trend's photophric form:

7.Multipital (power) shape of the trend:

8.-Type (hyperbolic 2 types) Trend shape:

9.Gypebolic shape of trend 3 type:

10. Round 3 degree:

For all nonlinear, relatively initial variables of models (regression equations), and most of them are needed to conduct auxiliary transformations presented in the table below.

Table 5.

Models reduced to linear trend

Model The equation Conversion
Multiplicative (power)
Hyperbolic I type
Hyperbolic type II
Hyperbolic III type
Logarithmic
Inverse streaming

In the formulas listed in the table, as in all formulas describing the trend model, there are coefficients of equations.

However, with the practical use of linearization by transformation of the studied variables, it should be borne in mind that estimates of the parameters obtained by linearization with M.N.K. (method smallest squares), minimize the sum of the squares of deviations for transformed, not source variables. Therefore, the evaluation dependencies obtained by linearization need clarification.

To solve the task of analytical smoothing of dynamic rows in the Statistica system, we will need to create several new additional variables required for execution. further work, as well as carry out some auxiliary operations to transform nonlinear trend models to linear.

So, we have to construct the trend equation, which is essentially the regression equation in which "time" is acting as a factor. First of all, we will create a variable "T", which contains the time of the fourth period. Since the fourth period includes 12 years, the variable "T" will consist of natural numbers from 1 to 12, corresponding to the months of the year.

In addition, we will need several more variables to work with some trend models, the content of which can be understood from their designation. These are variables obtained from the time series: "T ^ 2", "T ^ 3", "1 / T" and "LN T". As well as variables obtained from the source data over the fourth period: "1 / Import4" and "LN Import4". You also need to create the same table for export. All this is invited to do on a new work sheet, coping there data for the 4th period.

To do this, we use the Workbook / Insert menu already known to us.

As a result, we obtain the following spreadsheets.

Fig. 38. Table with auxiliary variables for imports

Fig. 39. Table with auxiliary variables for export

For analytical alignment of the speakers, we will use the Multiple Regression module in the Statistics menu. Consider an example of building a graphic image and determining the numerical parameters of a trend expressed by linear dependence.

Fig. 40. MULTIIPLE REGRESSION MODULE IN STATISTICS menu

To select dependent and independent variables, we use the VARIABLES button.

In the window that opens in the left information field, we choose the dependent variable Y t(In our case, this is Import 4 - data for the fourth period). The numbers of selected dependent variables are displayed at the bottom in the Dependent VAR field. (OR List for Batch). Accordingly, in the right field, we choose independent variables (in our case one time "T"). The numbers of selected independent variables are displayed at the Independent Variable List field.

After the selection of variables is completed, click OK. The system issues a window with generalized results of the calculation of the trend parameters (hereinafter, they will be considered in more detail) and the possibility of selecting the direction for the subsequent detailed analysis. Note that the value of the evaluation declined in red indicates the statistical significance of the results.

Fig. 41. Advanced bookmark

A multiple buttons are located on the tab, which allows you to get the most detailed information on the analysis direction of interest to us. When you press it, we obtain two tables with the results of regression analysis. The first presents the results of calculating the parameters of the regression equation, in the second - the main indicators of the equation.

Fig. 42. The main indicators of the import data equation for the fourth period (linear trend)

Here N. \u003d - Volume of an effective variable. In the top field are indicators R ,, Adjusted R, F, P, STD.ERROR OF ESTIMATE According to theoretical correlation ratio, the determination coefficient, the refined determination coefficient, the calculated value of the Fisher's criterion (in brackets given the number of degrees of freedom), the level of significance, the standard error of the equation (the same indicators can be seen in the second table). In the table itself, we are interested in column IN in which the coefficients of the equation, column t. and column p-LEVEL denoting the estimated value of the T-criterion and the estimated level of significance necessary to assess the significance of the parameters of the equation. At the same time, the system helps the user: when the procedure implies a value check, Statistica allocates significant elements in red (i.e., the zero hypothesis is rejected by zero parameter equality). In our case | T Fact | \u003e T Table for both parameters, therefore they are meaningful.

Fig. 43. Parameters of the regression equation for import data for the fourth period (linear trend)

To estimate the statistical significance of the equation as a whole on the Advanced tab, we use the ANOVA button (Goodness of Fit), which allows you to get a table of dispersion analysis and the value of the Fisher's F-criterion.

Fig. 44. Dispersion table

Sums of Squares. - The sum of the squares of deviations: at the intersection with a string Regression - The sum of the squares of the deviations of theoretical (obtained according to the regression equation) of the signs from the average value. This sum of squares is used to calculate the factor-explained dispersion of the dependent variable. At the intersection with a string Residual - the sum of the squares of the deviations of theoretical and actual values \u200b\u200bof the variable (for calculating the residual, inexplicable dispersion), Total - deviations of the actual values \u200b\u200bof the variable from the average value (for calculating general dispersion). Column df. - the number of degrees of freedom, Means Squares. Indicates dispersion: At the intersection with a string Regression - Factory, with a string Residual - residual, F. - Fisher's criterion used to assess the overall significance of the equation and the determination coefficient, p-LEVEL - significance level.

The parameters of the trend equation to Statistica, as in most other programs, are calculated by the method of least squares (MNC).

The method allows you to obtain the values \u200b\u200bof the parameters in which it provides minimization of the sum of the squares of the deviations of the actual levels from smoothed, i.e. the resulting analytic alignment.

The mathematical apparatus of the least squares method is described in most works on mathematical statistics, so there is no need to stop in detail on it. Recall only some points. So, to find the parameters of the linear trend (2.10) it is necessary to solve the system of equations:

This equation system is simplified if values t. Pick up so that their amount is zero, i.e., the beginning of the countdown of the time is tolerated into the middle of the period under consideration. Obviously, the transfer of the origin of the coordinate makes sense only with manual processing of a dynamic series.

If, then.

In general, the system of equations for finding polynomial parameters You can write as

When smoothing the time series by exponential (which is often used in economic studies), to determine the parameters, a method of least squares should be applied to the logarithms of the source data.

After transferring the start of time in the middle of the row:

hence:

If more complex changes in the time series levels are observed and the alignment is carried out by an indicative form function, the parameters are determined as a result of solving the following system of equations:

In practice, the study of socio-economic phenomena is extremely rarely found by dynamic series, the characteristics of which are fully consistent with the signs of reference mathematical functions. This is due to a significant number of factors of different nature affecting the levels of the series and the trend of their change.

In practice, most often build whole line Functions describing the trend and then choose the best on the basis of a formal criterion.

Fig. 45. Bookmark Residuals / Assumptions / Prediction

Here we use the PERFORM RESIDUAL ANALYSIS button that opens the residual analysis module. Under the residues (residuals), in this case it is understood as the deviation of the initial values \u200b\u200bof the dynamic series from the projected, in accordance with the selected trend equation. Immediately go to the Advanced tab.

Fig. 46. \u200b\u200bAdvanced Bookmark in Perform Residual Analysis

We use the Summary: Residuals & Predicted button that allows you to get the table of the same name, which contains the source values \u200b\u200bof the dynamic series of OBSERVED VALUE, predicted values \u200b\u200bfor the selected trend model Predicted Value, deviations of the forecast values \u200b\u200bfrom the original Residual Value, as well as various special indicators and standardized values. The table shows the maximum, minimum value, average and median for each column.

Fig. 47. Table containing indicators and special values \u200b\u200bfor a linear trend

In this table, the Residual Value column is the greatest interest, the values \u200b\u200bof which are later used to characterize the quality of the selection of the trend, as well as the Predicted Value column, which contains the projected dynamic row values \u200b\u200bin accordance with the selected trend model (in our case - linear).

Next, we construct a graph of the initial time series in conjunction with the projected values \u200b\u200bcalculated in accordance with the linear equation for the fourth period. To do this, it is best to copy the values \u200b\u200bfrom the Predicted Value column to the table in which variables were created to build trends.

Fig. 48. The third period of dynamic range of import (billion $) and linear trend

So, we got all the necessary results of the calculation of the trend parameters, expressed by the linear model, for the fourth period of the original dynamic series, and also built a schedule this seriesCombined with a trend line. Next, the remaining trends models will be presented.

It should be noted that as a result of the linearization of the power and exponential functions of Statistica, returns the value of a linearized function equal, so they need to be converted to further use using the following elementary transaction, including to build graphic images. For hyperbolic functions, as well as for an internaliform function, it is necessary to convert the type.

It is also advisable to create additional variables and obtain them using formulas based on existing variables.

So, when solving a task using the Multiple Regression procedure, you must select as variables. natural logarithms The source row and time axis.

Fig. 49. Main indicators of the equation for import data over the third period (power model)

Fig. 50. Parameters of the regression equation for import data over the third period (power model)

Fig. 51. Dispersion table

Fig. 52. Table containing indicators and special values \u200b\u200bfor a power model

Then, as in the case of a linear trend, copy the value from the Predicted Value column to the table, but there for this we build another variable in which we obtain the forecast values \u200b\u200bfor power function By conversion.

Fig. 53. Creating an additional variable

Fig. 54. Table with all variables

Fig. 55. The third period of dynamic range of import (billion $) and a power model

Fig.56. The main indicators of the equation for import data over the third period (exponential model)

Fig. 57. The third period of the dynamic range of imports (billion $) and the exponential model

Fig.58. The main indicators of the equation for import data over the third period (reverse model)

Fig. 59. The third period of the dynamic range of import (billion $) and the reverse model

Fig. 60. Main indicators of the equation for import data over the third period (polynomial of the second degree)

Fig. 61. The third period of the dynamic range of import (billion $) and a second degree polynomial

Fig. 62. Main indicators of the equation for import data over the third period (polynomial 3rd degree)

Fig. 63. The third period of the dynamic range of imports (billion $) and the polynomial of the 3rd degree


Fig. 64. Main indicators of the equation for import data over the third period (1-odd hyperbole)

Fig. 65. The third period of dynamic range of import (billion $) and 1-eyed hyperbole


Fig. 66. The main indicators of the equation for import data over the third period (type 3 hyperbole)

Fig. 67. The third period of dynamic range of import and hyperbole 3 types


Fig. 68. Main indicators of an equation for import data over the third period (logarithmic model)

Fig. 69. The third period of dynamic series Import (billion $) and a logarithmic model


Fig. 70. Main indicators of the equation for import data over the third period (inverse streaming model)

Fig. 71. The third period of the dynamic range of imports (billion $) and the fragmentation model


Then we construct a table with auxiliary variables to build trends for export.

Fig. 72. Table with auxiliary variables

We do the same operations as for the fourth period of imports.

Fig. 73. The main indicators of the export data equation for the third period (linear model)

Fig. 74. The third period of the dynamic range of exports (billion $) and linear model

Fig. 75. The main indicators of the export data equation for the third period (a power trend model)

Fig. 76. The third period of the dynamic series of exports and a power model


Fig. 77. Main indicators of the equation for export data over the third period (exponential trend model)

Fig. 78. The third period of the dynamic range of exports (billion $) and the exponential model


Fig. 79. The main indicators of the export data equation for the third period (reverse model of the trend)

Fig. 80. The third period of the dynamic range of exports (billion $) and the return model


Fig. 81. The main indicators of the equation for export data over the third period (polynomial of the second degree)

Fig. 82. The third period of the dynamic range of exports (billion $) and a second degree polynomial


Fig. 83. The main indicators of the export data equation for the third period (polynomial of third degree)

Fig. 84. The third period of the dynamic range of exports (billion $) and a polynomial of a third degree


Fig. 85. The main indicators of the export data equation for the third period (1-odd hyperbole)

Fig. 86. The third period of dynamic range of exports and hyperbole 1st type


Fig. 87. Main indicators of the equation for export data over the third period (3nd type hyperbole)

Fig. 88. The third period of the dynamic range of exports (billion $) and a 3th type hyperbole


Fig. 89. The main indicators of the export data equation over the third period (logarithmic model)

Fig. 90. The third period of the dynamic range of exports (billion $) and the logarithmic model


Fig. 91. The main indicators of the export data equation for the third period (fragrance model)

Fig. 91. The third period of the dynamic series of exports (billion $) and the fragmentation model


Choosing the best trend

As already noted, the problem of choosing a curve form is one of the main problems that are faced with the alignment of a number of speakers. The solution to this problem largely determines the results of the trend extrapolation. In most specialized programs to select the best trend equation, it is possible to take advantage of the following criteria:

The minimum value of the rms trend error:

,

where - the actual levels of a number of speakers;

Row levels defined by the trend equation;

n -number of row levels;

p -the number of factors in the trend equation.

- the minimum value of the residual dispersion:

The minimum value of the average approximation error;

The minimum value is moderate absolute error;

Maximum value determination coefficient;

The maximum value of Fischer's F-Criteria:

: ,

where k. - the number of degrees of freedom of factor dispersion, equal to the number of independent variables (factors) in the equation;

n-K-1 - the number of degrees of freedom of residual dispersion.

The use of a formal criterion to select a curve form, apparently, will give practically suitable results if the selection will take place in two stages. At the first stage, dependences are selected from the position of a meaningful approach to the task, resulting in limiting the circle of potentially acceptable functions. At the second stage, the values \u200b\u200bof the criterion are calculated for these functions and the same curves are selected, which corresponds to the minimum value.

In this manual, a formal method is used to identify the trend, which is based on the use of a numerical criterion. As such a criterion, the maximum determination coefficient is considered:

.

Deciphering the designations and formulas of these indicators are given in previous sections. The determination coefficient shows how the proportion of the total dispersion of the productive sign is due to the variation of the sign - factor. In Statistica tables, it is denoted as R?.

The following tables will feature the equations of trend models and the determination coefficients of import data.

Table 6.

The equations of models of trends and the coefficients of the determination of Import.

Comparing the values \u200b\u200bof determination coefficients for different types Curves can be concluded that for the third period of the third period, the best form of the trend will be a polynomial of a third degree for imports and for export.

Next, it is necessary to analyze the selected trend model from the point of view of its adequacy to the actual trends of the time series under study through the assessment of the reliability of the trend equations of the Fisher's F-criterion. In this case, the calculated value of the Fisher's criterion for import is 16.573; for export - 13,098, and table value With a level of significance, equals 3.07. Consequently, this trend model is recognized as an adequately reflective trend of the studied phenomenon.

In the three previous notes, regression models described that allow you to predict the response by the values \u200b\u200bof explanatory variables. In this article, we will show how with the help of these models and other statistical methods to analyze the data collected throughout the consecutive time intervals. In accordance with the peculiarities of each company mentioned in the scenario, we will consider three alternative approaches to the analysis of temporary series.

The material will be illustrated by a through example: prediction of income of three companies. Imagine that you are working by an analyst in a major financial company. To evaluate the investment prospects of its customers, you need to predict the income of three companies. For this you have collected data on the three companies of interest to you - Eastman Kodak, Cabot Corporation and Wal-Mart. Since companies differ in terms of business activity, each time has its own unique features. Consequently, to predict it is necessary to apply different models. How to choose the best prediction model for each company? How to evaluate investment perspectives based on forecasting results?

The discussion begins with the analysis of annual data. Two methods of smoothing such data are demonstrated: moving average and exponential smoothing. The following is the procedure for calculating the trend using the least squares method and more complex prediction methods. In conclusion, these models apply to temporary rows built on the basis of monthly or quarterly data.

Download note in format or, examples in format

Forecasting in business

Since economic conditions change over time, managers should predict the influence that these changes will have on their company. One of the methods to ensure accurate planning is forecasting. Despite the large number of developed methods, they all pursue the same goal - to predict the events that will occur in the future to take them into account when developing plans and strategies for the development of the company.

Modern society is constantly experiencing a need for forecasting. For example, to develop the right policies, government members must predict unemployment, inflation, industrial production, income tax individuals and corporations. To determine the needs in the equipment and staff, the director of airlines must correctly predict the volume of air transport. In order to create a sufficient number of places in the hostel, college administrators or universities want to know how many students will go to them educational institution next year.

There are two generally accepted approaches to forecasting: high-quality and quantitative. Methods of high-quality forecasting are especially important if quantitative data is not available to the researcher. As a rule, these methods are very subjective. If statistics are available on the history of the study object, the methods of quantitative prediction should be applied. These methods allow you to predict the state of the object in the future on the basis of data on its past. Methods of quantitative forecasting are divided into two categories: analysis of temporary series and methods for analyzing causal dependencies.

Time series - This is a set of numeric data obtained during consecutive periods of time. The method of analyzing time series allows you to predict the value of a numerical variable based on its past and present values. For example, daily shares quotes on the New York Stock Exchange form a time range. Another example of a temporary series is the monthly values \u200b\u200bof the consumer price index, quarterly values \u200b\u200bof gross domestic product and annual sales revenues of some company.

Methods for analyzing causal dependenciesallow to determine which factors affect the values \u200b\u200bof the predicted variable. These include methods of multiple regression analysis with bearing variables, econometric modeling, analysis of leading indicators, methods for analyzing diffusion indices and other economic indicators. We will tell only about the methods of forecasting on the analysis of time sx rows.

Components of the classic multiplicative model temporal sx rows

The main assumption underlying the analysis of time series is as follows: the factors affecting the object under study in the present and the past will influence it in the future. Thus, the main objectives of analyzing time series are to identify and allocate factors that are important for forecasting. To achieve this goal, many mathematical models have been developed designed to study the oscillations of components included in the temporary series model. Probably the most common is a classic multiplicative model for annual, quarterly and monthly data. To demonstrate the classic multiplicative model of time series, consider data on the actual income of WM.Wrigley Jr. COMPANY for the period from 1982 to 2001 (Fig. 1).

Fig. 1. Schedule of the actual gross income of WM.Wrigley Jr. Company (million dollars. current prices) for the period from 1982 to 2001

As you can see, for 20 years, the actual gross income of the company has had an increasing trend. This long-term trend is called trend. Trend.- not the only component of the time series. In addition to it, the data has cyclic and irregular components. Cyclical component Describes the vibration of data up and down, often correlating with business activity cycles. Its length varies in the range from 2 to 10 years. Intensity, or amplitude, cyclic component is also not constant. In some years, the data may be higher than the value predicted by the trend (that is, be in the neighborhood of the peak of the cycle), and in other years - below (i.e. be at the bottom of the cycle). Any observed data that are not lying on the trend curve and non-obese cyclic addiction are called irregular or random components. If the data is written daily or quarterly, an additional component occurs, called seasonal. All components of temporary series characteristic of economic applications are shown in Fig. 2.

Fig. 2. Factors affecting temporary rows

The classic multiplicative model of the temporary series claims that any observed value is the product of the listed components. If the data is annual, observation Y. I.corresponding i.-mu year, expressed by the equation:

(1) Y I. = T I.* C I.* I I.

where T I. - Trend value C I. i.-like year, I I. i.-The year.

If the data is measured monthly or quarterly, observation Y I.corresponding to the i-MU period is expressed by the equation:

(2) Y i \u003d t i * s i * c i * i i

where T I. - Trend value S I. - the value of the seasonal component in i.period, C I. - the value of the cyclic component in i.period, I I. - the value of the random component in i.- -aker period.

At the first stage of analyzing time series, a data schedule is built and their dependence on time is revealed. First, it is necessary to find out whether there is a long-term increase or decreasing data (i.e. trend), or a time row fluctuate around the horizontal line. If the trend is absent, then the method of moving medium or exponential smoothing can be applied to smoothing the data.

Smoothing annual time series

In the script, we mentioned Cabot Corporation. Having headquartered in Boston, Massachusetts, it specializes in the production and sale of chemicals, building materials, fine chemistry products, semiconductors and liquefied natural Gas. The company has 39 factories in 23 countries. The company's market value is about $ 1.87 billion. Its shares are listed on the New York Stock Exchange under the abbreviation of SVT. The company's revenues for the specified period are shown in Fig. 3.

Fig. 3. Cabot Corporation revenues in 1982-2001 (billion dollars)

As you can see, a long-term tendency to increase income is darkened large quantity oscillations. Thus, the visual analysis of the schedule does not suggest that the data has a trend. In such situations, you can apply the methods of moving medium or exponential smoothing.

Moving medium.Method of moving average is very subjective and depends on the length of the period L.selected to calculate medium values. In order to exclude cyclic oscillations, the length of the period should be an integer, multiple middle Length cycle. Moving average for the selected period having a length L.form a sequence of average values \u200b\u200bcalculated for length sequences L.. Moving averages are symbols MA (L).

Suppose we want to calculate five-year moving averages according to data measured during n. \u003d 11 years. Insofar as L. \u003d 5, five-year moving averages form a sequence of average values \u200b\u200bcalculated in five consecutive values \u200b\u200bof the time series. The first of five-year moving averages is calculated by summing up the data on the first five years, followed by a division of five:

The second five-year moving average is calculated by summing up the data on the years from the 2nd to the 6th, followed by a division of five:

This process continues until the moving average for the last five years has been calculated. Working with annual data, it should be considered a number L. (The length of the period selected to calculate the moving average) odd. In this case, it is impossible to calculate the moving average for the first ( L. - 1) / 2 and the latter ( L. - 1) / 2 years. Consequently, when working with five-year-moving averages, it is impossible to perform calculations for the first two and last two years. The year for which the moving average is calculated, should be in the middle of a period having a length L.. If a n. \u003d 11, a L. \u003d 5, the first moving average must comply with the third year, the second is the fourth, and the last one is the ninth. In fig. 4 shows the graphics of 3- and 7-year-old moving averages calculated for Cabot Corporation revenues for the period from 1982 to 2001.

Fig. 4. Charts of 3- and 7-year-old moving averages calculated for CABOT CORPORATION revenues

Note that when calculating three-year-moving averages ignored the observed values \u200b\u200bcorresponding to the first and last years. Similarly, when calculating seven-year-moving averages there are no results for the first and last three years. In addition, seven-year moving averages are much smoothing the time series than three year old. This is because the seven-year-moving average corresponds to a longer period. Unfortunately than more Length The period, the smaller the number of moving averages can be calculated and submitted on the schedule. Consequently, more than seven years to calculate the moving averages, it is undesirable to calculate, since too many points will fall from the beginning and the end of the graph, which will distort the form of the temporary series.

Exponential smoothing.To identify long-term trends characterizing data changes, except for moving averages, an exponential smoothing method is applied. This method also allows you to make short-term forecasts (within one period) when the presence of long-term trends remains in question. Due to this, the method of exponential smoothing has a significant advantage over the method of moving averages.

The exponential smoothing method received its name from the sequence of exponentially suspended moving averages. Each value in this sequence depends on all previous observed values. Another advantage of the exponential smoothing method over the method of sliding average is that when using the latter, some values \u200b\u200bare discarded. With exponential smoothness of weight assigned to the observed values, decrease with time, so after the calculations are fulfilled, the most common values \u200b\u200bwill receive the greatest weight, and rare values \u200b\u200bare the smallest. Despite the enormous amount of calculations, Excel allows you to implement an exponential smoothing method.

Equation that allows you to smooth the time series within an arbitrary period of time i., Contains three members: current observed value Y. I.owned by a temporary row, previous exponentially smoothed value E. I. –1 and assigned weight W..

(3) E 1 \u003d y 1 e i \u003d wy i + (1 - w) e i-1, i \u003d 2, 3, 4, ...

where E. I. - the value of the exponentially smoothed series calculated for i.-to period E I. –1 - the value of the exponentially smoothed row calculated for ( i. - 1) -Go period, Y I. - the observed value of the time series in i.period, W. - Subjective weight, or smoothing coefficient (0< W. < 1).

The choice of a smoothing coefficient, or weight assigned to members of a series, is fundamentally important because it directly affects the result. Unfortunately, this choice is subject to some extent. If the researcher wants to simply exclude unwanted cyclic or random fluctuations from the time series, small values \u200b\u200bshould be chosen W. (Close to zero). On the other hand, if the time series is used to predict, you need to choose a big weight W. (close to one). In the first case, long-term trends in the time series are clearly shown. In the second case, the accuracy of short-term forecasting increases (Fig. 5).

Fig. 5 graphs of exponentially smoothed time series (w \u003d 0.50 and w \u003d 0.25) for CABOT CORPORATION income data from 1982 to 2001; For clause formulas, see Excel File

Exponentially smoothed value obtained for i.- the time interval, can be used as an estimate of the predicted value in ( i.+1) -Mone interval:

To predict CABOT CORPORATION revenues in 2002 on the basis of an exponentially smoothed temporary series, appropriate W. \u003d 0.25, you can use the smoothed value calculated for 2001. From fig. 5 It can be seen that this value is equal to $ 1651.0 million. When the company's income data is available in 2002, equation (3) can be applied and predicted the income level in 2003 using the smoothed income value in 2002:

Analysis package Excel is able to build an exponential smoothing schedule in one click. Go through the menu DataData analysis And select the option Exponential smoothing (Fig. 6). In the window that opens Exponential smoothing Set the parameters. Unfortunately, the procedure allows you to build only one smoothed row, so if you want to "play" with the parameter W.Repeat the procedure.

Fig. 6. Building an exponential smoothing graph using a package of analysis

Calculating trends using the least squares method and forecasting

Among the components of the time series, the trend is more often explored. It is the trend that allows you to make short-term and long-term forecasts. To identify a long-term tendency to change the time series usually build a graph on which the observed data (values \u200b\u200bof the dependent variable) are deposited on the vertical axis, and the time intervals (independent variable values) are on the horizontal. In this section, we describe the procedure for identifying a linear, quadratic and exponential trend using the least squares method.

Model Trendit is the simplest model used to predict: Y I. = β 0 + β 1 X I. + ε i. Linear trend equation:

At a given level of significance α zero hypothesis deviates if the test t.-station more top or less lower critical level t.distribution. In other words, the decisive rule is formulated as follows: if t. > t. U. or t. < t L., zero hypothesis H 0 deviates, otherwise, zero hypothesis is not deflected (Fig. 14).

Fig. 14. Areas of deviation of the hypothesis for a bilateral criterion for the significance of the autoregression parameter A R.having the highest order

If the zero hypothesis ( A R. \u003d 0) does not deviate, it means that the selected model contains too many parameters. The criterion allows you to discard the senior member of the model and evaluate the autoregression model of order p-1. This procedure should be continued until the zero hypothesis H 0 Will not be rejected.

  1. Choose order r estimated autoregression model, taking into account the fact that t.-criteria has significance n.-2p - 1. degrees of freedom.
  2. Form the sequence of variables r "With delay" so that the first variable delays on one time interval, the second is two and so on. Last value should belanded on r time intervals (see Fig. 15).
  3. Apply Analysis packageExcel to calculate a regression model containing all r The values \u200b\u200bof the temporary row with delay.
  4. Evaluate the importance of the parameter A R.having the highest order: a) if the zero hypothesis deviates, everything can be included in the autoregression model r parameters; b) if the zero hypothesis does not deviate, throw away rvariable and repeat item 3 and 4 for a new model including p-1 parameter. Checking the significance of the new model is based on t.-criteria, the number of degrees of freedom is determined by the new number of parameters.
  5. Repeat Clause 3 and 4 until the older member of the autoregression model becomes statistically significant.

To demonstrate autorgetic modeling, back to the analysis of the temporary series of real income of WM. Wrigley Jr. In fig. 15 shows the data necessary to build autoregression models of the first, second and third order. To build a third-order model, all columns of this table are needed. When constructing the autoregression model of the second order, the last column is ignored. When constructing the autoregression model of the first order, the last two columns are ignored. Thus, when constructing autoregression models of the first, second and third order, one, two and three, respectively, are excluded from 20 variables.

The choice of the most accurate autoregression model begins with a third-order model. For correct work Package Analysis follows as an input interval Y. Specify the B5: B21 range, and the input interval for H. - C5: E21. Analysis data is shown in Fig. sixteen.

Check the significance of the parameter And 3.having the highest order. His rating and 3. equal to -0.006 (C20 cell in Fig. 16), and the standard error is 0.326 (cell D20). To check hypotheses H 0: A 3 \u003d 0 and H 1: A 3 ≠ 0 Calculate t.-statistics:

t.-criteria with N-2p-1 \u003d 20-2 * 3-1 \u003d 13 degrees of freedom are equal: t L. \u003d Student. Production (0.025; 13) \u003d -2,160; t U. \u003d Student. Produce (0.975; 13) \u003d +2,160. Since -2,160< t. = –0,019 < +2,160 и r \u003d 0.985\u003e α \u003d 0.05, zero hypothesis H 0 It is impossible to deflect. Thus, the parameter of the third order does not have statistical significance in the autoregression model and must be removed.

We repeat the analysis for the autoregression model of the second order (Fig. 17). Assessment of the parameter having the highest order a 2. \u003d -0.2205, and its standard error is 0.276. To check hypotheses H 0: a 2 \u003d 0 and H 1: A 2 ≠ 0 Calculate t.-statistics:

At the level of significance α \u003d 0.05, critical values \u200b\u200bof bilateral t.-criteria with N-2P-1 \u003d 20-2 * 2-1 \u003d 15 degrees of freedom are equal: t L. \u003d Student. PROF (0.025; 15) \u003d -2,131; t U. \u003d Student. PROF (0.975; 15) \u003d +2,131. Since -2,131< t. = –0,744 < –2,131 и r \u003d 0.469\u003e α \u003d 0.05, zero hypothesis H 0 It is impossible to deflect. Thus, the second order parameter is not statistically significant, and it should be removed from the model.

We repeat the analysis for the autoregression model of the first order (Fig. 18). Assessment of the parameter having the highest order a 1. \u003d 1,024, and its standard error is 0.039. To check hypotheses H 0: A 1 \u003d 0 and H 1: A 1 ≠ 0 Calculate t.-statistics:

At the level of significance α \u003d 0.05, critical values \u200b\u200bof bilateral t.-criteria with n-2p-1 \u003d 20-2 * 1-1 \u003d 17 degrees of freedom are equal: t L. \u003d Student. Production (0.025; 17) \u003d -2,110; t U. \u003d Student. PROF (0.975; 17) \u003d +2,110. Since -2,110< t. = 26,393 < –2,110 и r = 0,000 < α = 0,05, нулевую гипотезу H 0 should be rejected. Thus, the first order parameter is statistically significant, and it cannot be deleted from the model. So, the final order autoregression model is better than other approximates the source data. Using estimates a 0. = 18,261, a 1. \u003d 1.024 and the value of the time series over the last year - y 20 \u003d 1 371.88, you can predict the value of the real income of WM. Wrigley Jr. Company in 2002:

Selection of an adequate prediction model

The above describes six methods for predicting the values \u200b\u200bof the time series: models of linear, quadratic and exponential trends and autoregression models of the first, second and third orders. Is there an optimal model? What kind of six described models should be used to predict the value of the time series? The following are four principles that need to be guided by selecting an adequate prediction model. These principles are based on the estimates of the accuracy of models. It assumes that the time series values \u200b\u200bcan be predicted by studying its previous values.

Principles of selecting models for forecasting:

  • Perform residual analysis.
  • Rate the value of the residual error with the help of squares of differences.
  • Rate the value of the residual error using absolute differences.
  • Guide the principle of savings.

Analysis of residues.Recall that the residue is called the difference between the predicted and observed value. Buing a model for a temporary series, you should calculate the residues for each of n. Intervals. As shown in Fig. 19, Panel A, if the model is adequate, the residues are a random component of the time series and, therefore, are distributed irregularly. On the other hand, as shown in the remaining panels, if the model is not adequate, the residues may have a systematic dependence that does not take into account either the trend (b panel), or cyclic (panel B), or the seasonal component (panel d).

Fig. 19. Analysis of residues

Measurement of absolute and rms residual errors.If the analysis of the residues does not allow to determine the only adequate model, you can use other methods based on the valuation of the residual error. Unfortunately, statistics did not come to the consensus regarding the best estimate of the residual errors of models used to predict. Based on the principle of least squares, you can first conduct a regression analysis and calculate the standard evaluation error S XY.. When analyzing a specific model, this value is the sum of the squares of the differences between the actual and predicted values \u200b\u200bof the time series. If the model perfectly approximizes the time row values \u200b\u200bin the previous time, the standard estimate error is zero. On the other hand, if the model is poorly approximizes the time series values \u200b\u200bin previous time, the standard estimate error is large. Thus, analyzing the adequacy of several models, you can choose a model having a minimal standard error of S XY.

The main disadvantage of this approach is to exaggerate errors in predicting individual values. In other words, any big difference between values Y. I. and Ŷ I. When calculating the sum of the squares of the SSE errors is built into the square, i.e. Increases. For this reason, many statistics prefer to apply to estimate the adequacy of the forecasting model Average absolute deviation (Mean Absolute Deviation - MAD):

When analyzing specific models, the MAD value is the average value of the difference modules between the actual and predicted values \u200b\u200bof the time series. If the model perfectly approximizes the time row values \u200b\u200bin the previous time, the average absolute deviation is zero. On the other hand, if the model is poorly approximizing the time series values, the average absolute deviation is large. Thus, analyzing the adequacy of several models, you can choose a model having a minimum average absolute deviation.

Principle of economy.If the analysis of standard errors of estimates and average absolute deviations does not allow to determine the optimal model, you can use the fourth method based on the principle of economy. This principle argues that from several equal models it is necessary to choose the simplest.

Among the six prediction models discussed in the chapter, linear and quadratic regression models, as well as the autoregression model of the first order, are the simplest and quadratic regression models. The remaining models are much more complicated.

Comparison of four forecasting methods.To illustrate the selection process optimal model Let's return to a temporary row consisting of the values \u200b\u200bof the real income of WM. Wrigley Jr. COMPANY. Compare four models: linear, quadratic, exponential and autoregression model of the first order. (Authority models of the second and third order only slightly improve the accuracy of predicting the values \u200b\u200bof this time series, so they can not be considered.) In Fig. 20 shows the balances of residues constructed when analyzing four forecasting methods with Package Analysis Excel. Making conclusions based on these graphs should be careful because the time series contains only 20 points. For construction methods, see the appropriate Excel File sheet.

Fig. 20. Schedules of residues constructed when analyzing four forecasting methods with Package Analysis Excel

No model, except for the autorgan model of the first order, does not take into account the cyclic component. It is this model better than other approximates observations and is characterized by the least systematic structure. So, the analysis of the residues of all four methods showed that the best is the autoregression model of the first order, and the linear, quadratic and exponential models have less accuracy. To make sure that compare the values \u200b\u200bof the residual errors of these methods (Fig. 21). With the calculation methodology, you can read by opening an excel file. In fig. 21 shows actual values Y I. (column Real income) predicted values Ŷ I.as well as remnants e. I. For each of the four models. In addition, the values \u200b\u200bare shown S. Yx. and MAD.. For all four models of values S. Yx. and MAD. Approximately the same. The exponential model is relatively worse, and the linear and quadratic model exceeds it with accuracy. As expected, the smallest values S. Yx. and MAD. It has the autoregression model of the first order.

Fig. 21. Comparison of four forecasting methods using S YX and MAD indicators

By choosing a specific prediction model, you must carefully follow further changes in the time series. Among other things, this model is created to properly predict the values \u200b\u200bof the time series in the future. Unfortunately, such prediction models poorly take into account changes in the structure of the time series. It is absolutely necessary to compare not only the residual error, but also the accuracy of predicting future temporary values \u200b\u200bobtained using other models. Measuring a new amount Y. I. At the observed time interval, it is necessary immediately compared with the predicted value. If the difference is too large, the prediction model should be revised.

Prediction of time sx rows based on seasonal data

So far, we studied temporary rows consisting of annual data. However, many temporary rows consist of the values \u200b\u200bmeasured quarterly, monthly, weekly, daily and even hourly. As shown in Fig. 2, if the data is measured monthly or quarterly, the seasonal component should be taken into account. In this section, we will consider methods that allow predicting the values \u200b\u200bof such time series.

In the script described at the beginning of the chapter, Wal-Mart Stores, Inc. was mentioned. The market capitalization of the company is 229 billion dollars. Its shares are listed on the New York Stock Exchange under the WMT abbreviation. The company's financial year ends on January 31, therefore November and December 2001 are included in the fourth quarter of 2002, as well as January 2002. The temporary number of quarterly income of the company is shown in Fig. 22.

Fig. 22. Quarter revenues of Wal-Mart Stores, Inc. (million dollars)

For such quarterly rows, as this, classical multiplicative model, except trend, cyclic and random component, contains a seasonal component: Y I. = T I.* S I.* C I.* I I.

Prediction of monthly and time sx rows using the least squares method.The regression model comprising a seasonal component is based on a combined approach. To calculate the trend, the least squares method described above are used, and for accounting for a seasonal component - a category variable (for details, see section. Regression models with fictitious variable and interaction effects). For approximation of temporary rows, the exponential model is used to approximate seasonal components. In the model, approximating a quarterly time series, for the accounting of four quarters we needed three fictitious variables. Q 1., Q 2. and Q 3.and in the model for a monthly time series of 12 months are represented by 11 fictitious variables. Since the variable log is used in these models as a response. Y I., but not Y I., To calculate these regression coefficients, you must return the conversion.

To illustrate the process of building a model, approximating a quarterly time series, return to the income of Wal-Mart. The parameters of the exponential model obtained by Package Analysis Excel, shown in Fig. 23.

Fig. 23. Regression analysis Quarterly income of Wal-Mart Stores, Inc.

It can be seen that the exponential model is quite well approximated by the source data. The coefficient of mixed correlation r. 2 99.4% equal (cells J5), corrected mixed correlation ratio - 99.3% (cells J6), test F.-station - 1,333,51 (cells M12), and r-Notion is 0.0000. At the level of significance α \u003d 0.05, each regression coefficient in the classical multiplicative model of the time series is statistically significant. By applying the potential operation to them, we obtain the following parameters:

Factors Interpreted as follows.

Using regression coefficients b I., You can predict the income received by the company in a particular block. For example, we will predict the company's income for the fourth quarter of 2002 ( X. I. = 35):

log \u003d. b. 0 + b. 1 H. I. = 4,265 + 0,016*35 = 4,825

= 10 4,825 = 66 834

Thus, according to the forecast in the fourth quarter of 2002, the company had to receive an income equal to $ 67 billion (a forecast should be made up to a million accuracy). In order to disseminate the forecast for the period of time, which is beyond the time series, for example, for the first quarter of 2003 ( X. I. = 36, Q 1. \u003d 1), you must perform the following calculations:

log. Ŷ I. = b 0. + b 1.H. I. + b 2 Q 1 = 4,265 + 0,016*36 – 0,093*1 = 4,748

10 4,748 = 55 976

Indexes

Indices are used as indicators that respond to changes in the economic situation or business activity. There are numerous species of indexes, in particular, price indices, quantitative indexes, value indices and sociological indices. In this section, we will consider only the price index. Index - The value of a certain economic indicator (or group of indicators) at a specific point in time, expressed as a percentage of its value in the base point of time.

Price index.A simple price index reflects the percentage change in the price of goods (or group of goods) for a specified period of time compared to the price of this product (or group of goods) at a specific point in the past. When calculating the price index, first of all, choose a base lapse of the time - the time interval in the past, with which comparisons will be made. When choosing a base interval for a particular index, periods of economic stability are more preferred compared with the periods of economic lifting or recession. In addition, the base gap should not be too remote in time so that the results of the comparison are not too much influenced changes in the technology and consumer habits. The price index is calculated by the formula:

where I I. - price index in i.-M year, R I. - price B. i.-M year, R Bases - price in the base year.

The price index is a percentage change in the price of goods (or group of goods) at a given period of time in relation to the price of goods in the base point of time. As an example, consider the price index for unleaded gasoline in the United States in the time of time from 1980 to 2002 (Fig. 24). For example:

Fig. 24. Gallon price of unleaded gasoline and a simple price index in the United States from 1980 to 2002 (Basic years - 1980 and 1995)

So, in 2002, the price of unaeter gasoline in the United States was 4.8% more than in 1980. Analysis of Fig. 24 shows that price index in 1981 and 1982. There was more price index in 1980, and then up to 2000 did not exceed the base level. Since 1980 was likely chosen as the base period, it probably makes sense to choose a closer year, for example, 1995. The formula for recalculating the index in relation to a new base intermediate time:

where I. new - a new price index, I. old - Old price index, I. Newthe base is the value of the price index in the new base year when calculating for the old base year.

Suppose that 1995 was chosen as a new base. Using formula (10), we get a new price index for 2002:

So, in 2002, unleaded gasoline in the United States cost 13.9% more than in 1995.

Unbelievable composite price indices.Despite the fact that the price index for any individual product is of undoubted interest, the price index is more important, which allows to assess the cost and standard of living of a large number of consumers. An unbelievable composite price index defined by formula (11) attributes each individual type of goods the same weight. The composite price index reflects the percentage change in the price of a group of goods (frequently called consumer basket) at a given period of time with respect to the price of this group of goods at the base point of time.

where t. i. - product number (1, 2, ..., n.), n. - the number of goods in the group under consideration - the amount of prices for each of n. goods during the time period t.- the amount of prices for each of n. of goods in the zero period of time - the value of a unbelievable compound index during the time period t..

In fig. 25 presents average prices for three types of fruit for the period from 1980 to 1999. To calculate a unbelievable composite price index in different years, formula (11) is applied, counting the Basic 1980.

So, in 1999. The total price of the pound of apples, the pound of bananas and pound of oranges by 59.4% exceeded the total price of these fruits in 1980

Fig. 25. Prices (per dollars) for three types of fruits and an unweighted composite price index

An unweighted composite price index expresses changes in prices for the entire group of goods over time. Despite the fact that this index is easy to calculate, it has two obvious drawbacks. First, when calculating this index, all types of goods are considered equally important, so expensive goods acquire an excessive effect on the index. Secondly, not all goods are consumed equally intensively, so changes in prices for little consumed goods are too strongly influenced by the unweighted index.

Weighted composite price indices.Due to the shortcomings of unfaithful price indices, weighed price indices are more preferred, taking into account differences in prices and levels of consumption of goods generating consumer basket. There are two types of weighted composite price indices. Price index Lapereedefined by formula (12) uses consumption levels in the base year. A weighted composite price index allows you to take into account the levels of consumption of goods that form a consumer basket, assigning a certain weight to each product.

where t. - time period (0, 1, 2, ...), i. - product number (1, 2, ..., n.), n. i. In the zero time period - the value of the lapere index during the time period t..

The lapere index calculations are shown in Fig. 26; As a basic used 1980.

Fig. 26. Prices (per dollars), quantity (consumption in pounds per capita) of three types of fruit and laperee index

So, the laperee index in 1999 is 154.2. This testifies to the fact that in 1999 these three types of fruits were 54.2% more expensive than in 1980. Please note that this index is a less unfinished index equal to 159.4, since prices for oranges are fruits consumed less than the rest, more than the price of apples and bananas. In other words, since the prices for fruits consumed most intensely rose less than prices for oranges, the lapere index is a less unbelievable compound index.

PAISH PRIVES index Uses the levels of consumption of goods in the current, and not the base period of time. Therefore, the Paashe Index more accurately reflects the full cost of consumption of goods in this moment time. However, this index has two essential drawbacks. First, as a rule, current consumption levels are difficult to determine. For this reason, many popular indexes use the Laperee index, and not the Paashe index. Secondly, if the price of some particular product included in the consumer basket increases sharply, buyers reduce the level of consumption of its need, and not due to changes in flavors. PAsaist index is calculated by the formula:

where t. - time period (0, 1, 2, ...), i. - product number (1, 2, ..., n.), n. - the number of goods in the group under consideration - the number of units of goods i. In the zero time period - the value of the Paashe index during the time period t..

Calculations of the PAASH index are shown in Fig. 27; As a basic used 1980.

Fig. 27. Prices (per dollars), quantity (consumption in pounds per capita) of three types of fruits and Paashean index

So, the Paashean index in 1999 is equal to 147.0. This testifies to the fact that in 1999 these three types of fruits were 47.0% more expensive than in 1980.

Some popular price indices.In business and economics, use several price indices. The most popular is the consumer price index (CONSUMER INDEX Price - CPI). Officially, this index is called CPI-U to emphasize that it is calculated for cities (Urban), although, as a rule, it is called simply CPI. This index is published monthly by the Bureau of Labor Statistics (U. S. Bureau of Labor Statistics) as the main tool for measuring the cost of life in the United States. The consumer price index is composite and suspended by the Lapeare method. With its calculation, prices 400 are the most widely consumed products, types of clothing, transport, medical and utilities. At the moment, when calculating this index, the period 1982-1984 is used as a basic. (Fig. 28). An important feature of the CPI index is its use as a deflator. The CPI index is used to recalculate actual prices into real by multiplying each price of the coefficient 100 / CPI. Calculations show that over the past 30 years, the average annual inflation rates in the United States amounted to 2.9%.

Fig. 28. Speaker Consumer Index Price; Full data See Excel File

Another important index of prices published by the Bureau of Labor Statistics is manufacturers index (Producer Price Index - PPI). The PPI index is a suspended compound index using the lapere method to assess the change in the prices of goods sold by their manufacturers. The PPI index is a leading indicator for the CPI index. In other words, an increase in the PPI index leads to an increase in the CPI index, and vice versa, a decrease in the PPI index leads to a decrease in the CPI index. Financial indices, such as the Dow-Jones index for shares of industrial enterprises (Dow Jones Industrial Average - DJIA), S & P 500 and NASDAQ, are used to assess the change in the value of shares in the United States. Many indices allow us to assess the profitability of international stock markets. Such indexes include the Nikkei index in Japan, Dax 30 in Germany and Sse Composite in China.

Traps associated with the analysis of time sx rows

The value of the methodology that uses information about the past and present in order to predict the future, more than two hundred years ago eloquently described statesman Patrick Henry: "I have only one lamp, illuminating the path, is my experience. Only knowing the past allows you to judge the future. "

The analysis of the time series is based on the assumption that the factors that influence business activity in the past and influencing in the present will act in the future. If this is true, the analysis of time series is an effective means of forecasting and management. However, critics classical methodsBased on analyzing time series, they argue that these methods are too naive and primitive. In other words, a mathematical model that takes into account the factors in the past should not mechanically extrapolate trends into the future without taking into account expert assessments, business experience, technology changes, as well as habits and needs of people. Trying to correct this provision, in recent years, specialists in econometrics have developed complex computer models of economic activity that these factors listed above.

However, the methods of analyzing time series represent an excellent tool for predicting (both short-term and long-term), if they are applied correctly, in combination with other methods of forecasting, as well as taking into account expert assessments and experience.

Summary.In the note using time series analysis, models have been developed to predict the income of three companies: WM. Wrigley Jr. Company, Cabot Corporation and Wal-Mart. The components of the time series are described, as well as several approaches to the forecasting of annual time series - the method of moving average, the method of exponential smoothing, linear, quadratic and exponential model, as well as the autoregression model. A regression model containing fictitious variables corresponding to the seasonal component is considered. The application of the least squares method is shown to predict monthly and quarterly time series (Fig. 29).

The degrees of freedom are lost when comparing the values \u200b\u200bof the time series.

When the trend type is set, it is necessary to calculate the optimal values \u200b\u200bof the trend parameters based on the actual levels. For this purpose, the least squares method (MNC) are usually used. Its value is already considered in previous chapters of the study manual, in this case the optimization is to minimize the sum of the squares of the deviations of the actual levels of the row from the level levels (from the trend). For each type of Trend, the MNA gives a system of normal equations, solving the trend parameters calculate. Consider only three such systems: for direct, for the 2nd order parabola and for exhibitors. Tests for determining the parameters of other types of trend are considered in a special monographic literature.

For linear trend Normal MN equations are:

Normal Equations of MNC for exhibitors They have the following form:

According to Table. 9.1 Calculate all three listed trends for a dynamic series of potatoes yields for the purpose of their comparison (see Table 9.5).

Table 9.5.

Calculation of trend parameters

According to the formula (9.29), the parameters of the linear trend are equal a \u003d.1894/11 \u003d 172.2 c / ha; b. \u003d 486/110 \u003d 4,418 c / ha. The linear trend equation has the form:

w.̂ = 172,2 + 4,418t.where t. = 0 In 1987, this means that the average actual and leveling level referred to the middle of the period, i.e. By 1991, equal to 172 c 1 Ra A average annual increase is 4,418 c / ha per year.

The parameters of the parabolic trend according to (9.23) are equal b. = 4,418; a. = 177,75; c \u003d. -0.5571. Parabolic trend equation has the view y \u003d. 177,75 + 4,418t. - 0.5571t. 2 ; t. \u003d 0 In 1991, this means that the absolute increase in yield slows down on average by 2 · 0.56 c / ha per year per year. The absolute increase itself is no longer a constant of a parabolic trend, but is an average value for the period. A year adopted for the beginning of the reference ie 1991, the trend passes through a point with an ordinate 77.75 c / ha; The free member of the parabolic trend is not an average level for the period. The parameters of the exponential trend are calculated by formulas (9.32) and (9.33) LN but \u003d 56,5658 / 11 \u003d 5,1423; Potentiation, get but \u003d 171.1; LN. k. \u003d 2,853: 110 \u003d 0.025936; Potentiation, get k. = 1,02628.

The exponential trend equation has the form: y.̅ = 171.1 · 1,02628. T..

This means that the average annual rate of yield rate for the period was 102.63%. At the point, the start of the reference, the trend passes the point with the ordinate 171.1 c / ha.

The levels calculated by the equations of trends are recorded in the last three columns table. 9.5. As can be seen according to this data. The estimated levels of levels in all three types of trends differ slightly, as the acceleration of the parabola, and the growth rate of the exponent is small. A significant difference has a parabola - the level growth since 1995 is terminated, while with a linear trend, the levels grow further, and when exponentially, their OST is accelerated. Therefore, for forecasts for the future, these three trends are non-equivance: with extrapolation of Parabolas for future years, the levels will be dramatically dispersed with a direct and exponent, which is seen from the table. 9.6. In this table the printout of the decision on the TEVM under the program "Statgraphics" of the same three trends is presented. The difference between their free members from the above is explained by the fact that the program numbers not from the middle, but from the beginning, so that the free members of the trends refer to 1986, for which T \u003d 0. The exponent equation on the printout is left in logarithm. The forecast is 5 years ahead, i.e. Until 2001. With the change in the origin of the coordinates (time counting) in the parabola equation, the average absolute increase is changed, the parameter b.. Since as a result of a negative acceleration, the increase is reduced all the time, and its maximum is at the beginning of the period. The Parabola constant is only acceleration.

In the "DATA" line, the levels of the original series are given; "Forecast Summary" means summary data for the forecast. In the following lines - equations are straight, parabolas, exponents - in logarithmic form. Count ME means the average discrepancy between the levels of the source row and the trend levels (level). For a direct and parabola, this discrepancy is always zero. Exponent levels on average by 0.48852 below the levels of the source row. The exact coincidence is possible if the true trend is the exhibitor; In this case, there is no coincidence, but the difference is not enough. Count Mae is a dispersion s. 2 - The measure of the extent of the actual levels relative to the trend, as stated in paragraph 9.7. Count MAE - the average linear deflection of the levels from the trend in the module (see paragraph 5.8); Count Mare - relative linear deviation in percent. Here they are given as indicators of the fitness of the selected trend type. A smaller dispersion and a deviation module has a parabola: it for the period 1986-1996. Closer to actual levels. But the choice of the type of trend cannot be reduced only to this criterion. In fact, the growth of the increase is the result of a large negative deflection, i.e., fault in 1996

The second half of the table is the forecast of the levels of yield in three types of trends for years; T \u003d 12, 13, 14, 15 and 16 from the beginning of reference (1986). Predicted levels of exponentials up to the 16th year of the slightly higher ,. why in a straight line. Trend-parabola levels - decrease, increasingly disagreeable with other trends.

As can be seen in Table. 9.4, when calculating the trend parameters, the levels of the source row are included with different weights - values t P. and their squares. Therefore, the effect of level fluctuations on the trend parameters depends on which number of the year there is a yield or a faulty year. If a sharp deviation falls for a year with a zero number ( t I. = 0 ), It does not affect the trend parameters, and if it comes to the beginning and the end of the row, it will affect strongly. Therefore, one-time analytical alignment incompletely frees the trend parameters from the influence of the oscillation, and with strong oscillations they can be very distorted that in our example it happened with parabola. To further eliminate the distorting effect of oscillations on the trend parameters should be applied method of multiple sliding alignment.

This technique is that the trend parameters are calculated not immediately throughout the row, but the moving method, first for the first t.time periods or moments, then for the period from the 2nd to t +. 1, from the 3rd to (t +. 2) -go level, etc. If the number of initial levels of the row is equal p, and the length of each sliding base of the parameter calculation is equal to t, The number of such sliding bases T or individual values \u200b\u200bof the parameters that will be determined according to them will be:

L. = p + 1 - t.

The application of the methodology of the sliding multiple alignment is considered, as can be seen from the above calculations, it is possible only with a sufficiently large number of row levels, as a rule 15 or more. Consider this technique on the example of the data Table. 9.4 - Dynamics of prices for non-fuel goods of developing countries, which again allows the reader to participate in a small scientific research. On the same example, we continue and forecasting methods in section 9.10.

If you calculate in our number of parameters for 11 years of periods (at 11 levels), then t. \u003d 17 + 1 - 11 \u003d 7. The meaning of repeated sliding alignment is that with consecutive shifts of the parameter calculation of parameters at the ends and in the middle there will be different levels with different signs and the magnitude of the deviations from the trend. Therefore, when the base shifts, the parameters will be inadened with others, during the subsequent averaging of the parameter values \u200b\u200bfor all shifts of the calculation base, there will be a further mutual risk of distortion of trend parameters by fluctuations.

Multiple moving alignment not only allows to obtain a more accurate and reliable estimate of the trend parameters, but also to control the correctness of the choice of the type of the trend equation. If it turns out that the leading trend parameter, its constant when calculating the sliding bases does not randomly fluctuate, and systematically changes its value significantly, it means that the type of trend has been chosen incorrectly, this parameter is not the constant.

As for the free member in multiple alignment, it is not necessary and, moreover, it is simply incorrectly calculated its value as an average for all the shifts of the base, because, with such a method, the individual levels of the initial series would be calculated in the calculation of the average with different weights, and the amount of level levels differed would with the sum of the members of the original series. The free member of the trend is the average level of level for the period, subject to time countdown from the middle of the period. When counting from the beginning, if the first level t I. \u003d 1, free member will be equal to: a. 0 = w.̅ - b.((N - 1) / 2). It is recommended that the length of the sliding base of calculating the trend parameters to choose at least 9-11 levels to sufficiently repay the level fluctuations. If the initial number is very long, the base can be up to 0.7 - 0.8 of its length. To eliminate the effect of long-periodic (cyclic) oscillations on the trend parameters, the number of shifts of the base should be equal to or more times the length of the oscillation cycle. Then the beginning and end of the base will continue to "run" all the cycle phases and in averaging the parameter on all shifts of its distortion from cyclic oscillations will be mutually ridiculed. Another way is to take the length of the sliding base equal to the length of the cycle so that the start of the base and the end of the base always accounted for the same phase of the oscillation cycle.

Since according to Table. 9.4, it has already been established that the trend has a linear form, we carry out the calculation of the average annual absolute increase, i.e. the parameters b. The equations of a linear trend with a moving method according to 11-year bases (see Table 9.7). It also provides the calculation of the data required for the subsequent study of the amount in paragraph 9.7. Let us dwell on the method of multiple alignment by sliding bases. Calculate the parameter b. For all databases:

Table 9.7.

Multiple moving alignment in a straight line



Trend equation: w.̂ = 104,53 - 1,433t.; t. \u003d 0 In 1987. So, the price index on average per year decreased by 1.433 points. Single alignment over all 17 levels can distort this parameter, for the initial level contains a significant negative deviation, and the final level is positive. In fact, one-time alignment gives the value of the average annual index change of only 0.953 points.




9.7. Methods of study and indicators allocability

If, when studying and measuring the trend of the dynamics of fluctuations in levels, only the role of interference, "information noise", from which it was possible to abstract, in the future, the amount itself becomes a subject of statistical research. The value of the study of oscillations of the levels of dynamic series is obvious: fluctuations in the yield, livestock productivity, meat production is economically undesirable, since the need for products of agro-commomple is constant. These oscillations should be reduced by applying progressive technology and other measures. On the contrary, seasonal fluctuations in the volume of production of winter and summer shoes, clothes, ice cream, umbrellas, skates - are necessary and logical, since the demand for these goods also fluctuates the seasons and uniform production requires extra cost To storing stocks. Regulation of the market economy both by the state and producers largely consists in regulating the fluctuations in economic processes.

Types of oscillations of statistical indicators are very diverse, but it can still be distinguished by three main: sawn or pendulum oscillating, cyclic long-period and randomly distributed volatility. Their properties and differences from each other are clearly visible during the graphic image. 9.2.

Pilotumor Pendulum oscillating It consists in alternate deviations of the levels from the trend into one and the other way. Such are the self-oscillation of the pendulum. Such self-oscillations can be observed in the dynamics of yield at a low level of agrotechnology: high yields with favorable weather conditions makes more nutrients from soil than they are formed natural way per year; The soil is impoverished, which causes a decrease in the following trend below the trend, it makes less nutrients than formed for a year, fertility increases, etc.

Fig. 9.2. . Types of oscillations

Cyclic long-period oscillating It is characteristic, for example, solar activity (10-11-year-old cycles), which means that the processes - polar shine processes, thunderstorms, the yield of individual crops in a number of areas, some diseases of people, plants in a number of areas, are peculiar to it on Earth. For this type, a rare change of signs of deviations from a trend and a cumulative (accumulating) effect of deviations of one sign, which may be hard to reflect on the economy. But the oscillations are well predicted.

Accidentally distributed over time, irregular, chaotic. It can occur when overlapping (interference) of multiple oscillations with different cycles duration. But it may arise as a result of the same chaotic amounting of the main reason for the existence of oscillations, for example, the amount of precipitation for the summer period, the air temperature on average for a month in different years.

To determine the type of oscillations, a graphic image is applied, the method of "turning points" M. Kendel, calculating the coefficients of autocorrelation deviations from the trend. These methods will be discussed below.

The main indicators characterizing the power of the oscillation of the levels are the indicators already known on chapter 5, characterizing the variation of the signs of the sign in the spatial aggregate. However, the variation in space and the volatility in time is fundamentally different. First of all, their main causes are different. The variation of the signs of the feature in at the same time existing units occurs due to differences in the conditions of the existence of units of aggregate. For example, different yields of potatoes in the state farms in the region in 1990 are caused by differences in soil fertility, as seeds, in agrotechnology. But amounts effective temperatures Behind the growing season and precipitation are not the causes of spatial variation, since in the same year, these factors almost vary in the region. In contrast, the main reasons for fluctuations in the yield of potatoes in the area for a number of years are the oscillations of meteorological factors, and the quality of the soils of the oscillations has almost no. As for the overall progress of agricultural engineering, he is the cause of the trend, but not hesitation.

The second indigenous difference is that the values \u200b\u200bof the variation in the spatial aggregate can be considered mainly not dependent on each other, on the contrary, the levels of dynamic series are usually dependent: these are indicators of a developing process, each stage of which is associated with previous states.

Thirdly, the variation in the spatial set is measured by the deviations of the individual values \u200b\u200bof the feature from the mean value, and the oscillates of the levels of the dynamic series is measured not by their differences from the average level (these differences include trend, and oscillations), and the deviations of the trend levels.

Therefore, it is better to use different terms: the differences in the sign in the spatial aggregate are called only by the variation, but do not fluctuations: no one will call the differences in the population of Moscow, St. Petersburg, Kiev and Tashkent "oscillations of the number of residents"! Deviations of the levels of dynamic series from the trend will always be called oscillate. The oscillations always occur in time, there can be no oscillations out of time, at a fixed moment.

Based on the qualitative content of the concept of volatility, the system of its indicators is also built. Indicators of strength fluctuation levels are: The amplitude of deviations of the levels of certain periods or moments from the trend (by module), the average absolute deviation of the levels from the trend (by module), the average quadratic response of the level of levels from the trend. Relative measurement measures: relative linear deviation from the trend and the coefficient of the oscillating - analogue of the coefficient of variation.

A feature of the methodology for calculating medium deviations from the trend is the need to take into account the loss of degrees of freedom of oscillations by value equal to the number of parameters of the trend equation. For example, a straight line has two parameters, and, as is known from the geometry, through any two points you can spend a straight line. It means, having only two levels, we will carry out the trend line exactly through these two levels, and no deviations of the levels from the trend will not be, although in fact, these two levels included fluctuations, were not free from the action of the sections of the vibration. Parabola of the second order will be accurately through any three points, etc.

Given the loss of degrees of freedom, the main absolute indicators of the oscillates are calculated by formulas (9.34) and (9.35):

medium linear deviation

(9.34)

average quadratic deviation

(9.35)

where y I. - actual level;

y.̂ I. - leveling level, trend;

n. - number of levels;

r - The number of trend parameters.

Time sign " t."In parentheses after the indicator means that this is an indicator of not a conventional spatial variation, as in chapter V, and the indicator of the volatility over time.

Relative amounts of oscillation are calculated by division absolute indicators on the average level for the entire period under study. The calculation of the amounts of volatility will be carried out according to the results of the analysis of the dynamics of the price index (see Table 9.7). Trend will take on the results of multiple sliding alignment, i.e. w.̂ = 104,53 - 1,433t. ; t. \u003d 0 in 1987

1. The amplitude of oscillations was from -14.0 in 1986 to +15.2 in 1984, i.e. 29.2 points.

2. The average linear deviation by the module will find by folding the modules | U I | (their sum is 132.3), and dividing on (etc), According to Formula (9.34):

\u003d 8.82 points.

3. The average quadratic deviation of the trend levels by formula (9.35) was:

\u003d 9.45 points.

Small excess medium quadratic deviation Above linear indicates the absence among deviations sharply released in absolute value.

4. The coefficient of the oscillating: or 9.04%. The oscillating is moderate, not strong. For comparison, we give indicators (without calculation) on the fluctuations of the yield of potatoes, the data of tables 9.1 and 9.5 - deviation from the linear trend:

s.(t.) \u003d 14.38 c with 1 hectare, v.(t.) = 8,35%.

To identify the type of oscillations, we use the reception proposed by M. Kendel. It consists in the calculation of the so-called "turning points" in a number of deviations from the trend and I. i.e. local extremes. Deviation or more algebraic value or fewer two adjacent points. Turn to Fig. 9.2. With pendulum dismissions, all deviations, except for two extreme, will be "swivel", therefore, their number will be p -1. With long-term cycles, a minimum one and one maximum fall on the cycle, and the total number of points will be 2 ( n.: l.), Where l. - Duration of the cycle. With randomly distributed in the time of the volatility, as M. Kendel proved, the number of turning points on average will be: 2/3 ( n. - 2). In our example, there would be 15 points with a 11-year-old pendulum, with a 11-year-old cycle, it would be 2- (17: 11) ≈ 3 points, with randomly distributed in time on average it would be (2/3) · (17-2 ) \u003d 10 points.

The actual number of points 6 goes beyond the boundaries of a two-time average quadratic deviation of the number of rotary points, which is equal to Kendel, in our case .

The presence of 6 points, at 2 points per cycle, means that about 3 cycles can be in the row, the duration of the period of which is 5.5 to 6 years. It is possible a combination of such cyclic oscillations with random.

Another method of analyzing the type of oscillating and searching the cycle length is based on calculating the coefficients Autocorrelation deviations from trend.

Autocorrelation is a correlation between the levels of a number or deviations from the trend, taken with a shift in time: for 1 period (year), 2, by 3, etc., therefore talk about the coefficients of autocorrelation of different orders: First, second, etc. Consider first the coefficient of autocorrelation of deviations from the first-order trend.

One of the main formulas for calculating the coefficient of autocorrelation deviations from the trend has the form:

(9.36)

How easy to see Table. 9.7, the first and latter in a number of deviations are involved only in one product in the numerator, and all other deviations from the second to (P - 1) - in two. Therefore, in the denominator, the squares of the first and last deviations should be taken with half weight as in the chronological average. According to Table. 9.7 We have:

Now turn to Fig. 9.2. With pendulum dismissions, all works in the numerator will be negative values, and the coefficient of first-order autocorrelation will be close to -1. With long-relative cycles, positive works of neighboring deviations will prevail, and the change of the sign is only twice per cycle. The longer the cycle, the greater the preponderance of positive works in the numerator, and the first-order autocorrelation coefficient closer to +1. With accidentally distributed in the time of the volatility, the signs of deviations alternate chaotic, the number of positive works close to the number of negative, in view of which the autocorrelation coefficient is close to zero. The value obtained speaks of the presence of oscillations and cyclic. The autocorrelation coefficients of the following orders: II \u003d - 0.577; Sh \u003d -0,611; Iv \u003d\u003d -0.095; V \u003d +0.376; Vi \u003d +0.404; VII \u003d +0.044. Consequently, the cycle anti-phase is closer than the CZ years (the largest negative coefficient during a shift for 3 years), and the coincident phases closer to the year, which gives the length of the oscillation cycle. These maximum at the absolute value of the coefficients are not close to one. This means that cyclic variance is mixed with significant random variance. Thus, the detailed autocorrelation analysis as a whole gave the same results as the conclusions on the autocorrelation of the first order.

If the dynamic row is sufficiently long, you can put and solve the problem of changing the amounts of oscillating over time. For this, these indicators are calculated on subpoles, but a duration of at least 9-11 years, otherwise the measurement of the oscillating is unreliable. In addition, it is possible to calculate the transmission indicators to the moving method, and then make them align, that is, to calculate the trend of the amounts of volatility. This is useful in order to conclude the effectiveness of measures used to reduce yield fluctuations and other unwanted oscillations, as well as to make a forecast for the expectancy in the future of oscillations on the Trend.

9.8. Measurement of resistance in dynamics

The concept of "stability" is used in very different meanings. In relation to the statistical study of the dynamics, we will consider two aspects of this concept: 1) stability as a category opposite to oscillating; 2) the stability of the correction of changes, i.e. stability of the trend.

In the first sense, the stability indicator that can only be relative must be changed from zero to one (100%). This is the difference between the unit and the relative indicator of the oscillating. The odds coefficient amounted to 9.0%. Consequently, the stability coefficient is 100% - 9.0% \u003d 91.0%. This indicator characterizes the proximity of the actual levels to the trend and is completely independent of the nature of the latter. Weak oscillating and high levels of levels in this sense may exist even with a complete witch in development, when the trend is expressed by a horizontal straight line.

Stability in the second sense characterizes not by itself levels, but the process of their directional change. It can be found, for example, how stable the process of reducing the specific costs of resources to the production of a unit of products is whether a sustainable tendency to reduce child mortality, etc. From this point of view, the total resistance to the directional change in the levels of dynamic series should be considered such a change in which each next Level or above all preceding (steady growth), or below all preceding (sustainable decline). Any violation of a strictly ranked level sequence indicates incomplete stability of changes.

From the definition of the concept of stability, the trend flows and the method of constructing its indicator. As an indicator of stability, you can use correlation ratio of ranks Ch. Spearman (Spearman) - r X..

where p - number levels;

Δ I is the difference in ranks of levels and numbers of periods of time.

With the full coincidence of rank levels, starting with the smallest, and numbers of periods (moments) of their time chronological order The rating correlation coefficient is +1. This value corresponds to the case of complete levels of increasing levels. With the complete opposite of ranks, the rank levels of the years the Spearman coefficient is -1, which means the complete stability of the level reduction process. With chaotic alternation of rank levels, the coefficient is close to zero, this means the instability of any trend. We present the calculation of the correlation coefficient of SpirMen according to the dynamics of the price index (Table 9.7) in Table. 9.8.

Table 9.8.

Calculation of the correlation coefficients of spirmen's ranks

Rank years R X.

Rank levels RU

R X.-R y.

(P x -p y) 2

Due to the presence of three pairs of "related ranks", we apply formula (8.26):

Negative meaning r X. Indicates a trend of lower levels, and the stability of this trend is below average.

It should be borne in mind that even with 100% stability of the trend in a number of speakers may be the oscillating levels, and the coefficient them Sustainability will be below 100%. With weak oscillation, but an even weaker trend, on the contrary, a high level stability coefficient is possible, but a trend stability coefficient close to zero. In general, both indicators are associated, of course, direct dependence: most often high levels of levels are observed simultaneously with greater trend-resistant.

Stability of development trend or complex stability, in dynamics can be characterized by the ratio between the average annual absolute change and the average quadratic (or linear) deflection of the trend levels:

If, as often happens, the distribution of the deviations of the row levels from the trend is close to normal, then with a probability of 0.95 deviation from the trend down will not exceed 1,645 s.(t.) in magnitude. Therefore, if in a number of speakers

from\u003e 1.64, then levels, lower than previous ones, will occur less than 5 times for 100 periods, or 1 time out of 20, i.e. the stability of the trend will be high. For from \u003d 1 level ranks of levels will occur on average 16 times out of 100, and when from \u003d 0.5 - already 31 times out of 100, i.e. the stability of the trend will be low. You can also use the ratio of the average growth rate to the oscillating ratio, which gives an indicator close to from - Sustainability indicator. This figure is more suitable for exponential trend. On indicators of the stability of nonlinear trends and common problems Sustainability of economic and social processes can be read more in the literature recommended for this chapter.

Straight line - trend returns of profitability (linear trend, built according to the actual ratio of profitability).


Example 14.6. We will construct a linear trend of interest rates on loans based on statistical data published in the Bulletin of Banking Statistics No. 4 (47) for 1997.

The second step is to search for values \u200b\u200bof the parameters of the equation. The parameters of trend models are determined using the system of normal equations. In the case of the use of a linear trend, the following system of equations are used to solve the smallest square method.

Example 14.7. Assuming the presence of cyclic oscillations, carry out a harmonic analysis of the dynamics of deviations from the linear trend on credit rates (y, - y,).

The linear trend well reflects the trend of changes under the action of a variety of variety of factors that change in different ways in different patterns. Equality of these factors when mutually affecting the characteristics of individual factors

With B \u003d 1, we have a linear trend, B \u003d 2 - parabolic, etc. The power form is flexible, suitable for displaying changes with a different measure of the proportionality of changes in time. The rigid condition is the mandatory passage through the origin of the coordinates at T \u003d 0, y \u003d 0. It is possible to complicate the shape of the trend y \u003d a + th or y \u003d a + th, but these equations cannot be log forwarded, it is difficult to calculate the parameters, and they are extremely rarely applied.

For a linear trend, normal MNA equations are

In formula (9.33), summation from \u003d - (L - 1) 2DU / \u003d (L-1) 2, as a whole, formula (9.33) is similar to the formula for a linear trend (9.29).

According to formula (9.29), the parameters of the linear trend are equal to A \u003d 1894/11 \u003d 172.2 c / ha 2\u003e L \u003d 486/110 \u003d 4,418 c / ha. The linear trend equation is viewed y \u003d 172.2 + 4,418 /, where (\u003d 0 in 1987, this means that the average actual and leveling level, referred to the middle of the period, i.e. by 1991, equal to 172 c 1 hectare, and the average annual increase is 4,418 c / ha per year.

Since according to Table. 9.4, it has already been established that the trend has a linear form, we carry out the calculation of the average annual absolute increase, i.e., the parameter to the equation of a linear trend

The oscillating is moderate, not strong. For comparison, we give indicators (without calculation) by oscillations of potatoes yield, tables 9.1 and 9.5 - deviation from the linear trend S (T) \u003d 14.38 C with 1 hectare, V (T) \u003d 8.35%.

To obtain sufficiently reliable limits of the forecast of the trend position, say, with a probability of 0.9 that the error will not be more specified, it follows an average error to multiply by the value / criterion of Student at the specified probability (or significance 1 - 0.9 \u003d 0.1 ) and with the number of freedom of freedom, equal, for a linear trend, N- 2, i.e. 15. This value is 1.753. We get the utmost error with this probability

Another method of measuring correlation in the ranks of the dynamics can serve as a correlation between those from the chain indicators of the series, which are constants of their trends. With linear trends - these are chain absolute gains. Calculating them along the source rows of dynamics (AXL, AYI), we find the correlation coefficient between absolute changes in formula (9.52) or, more precisely, according to formula (9.51), since the average changes are not equal to zero, in contrast to medium deviations from trends. The admissibility of this method is based on the fact that the difference between adjacent levels is mainly of oscillations, and the share of the trend in them is small, therefore, the distortion of correlation from the trend is very large with a cumulative effect for a long period, very little - for each year separately. However, it must be remembered that this is true only for a series with an indicator, significantly less units. In our example, for a number of yield, the C-inhabitant is 0.144, it is equal to 0.350 for the cost. The correlation coefficient of chain absolute changes amounted to 0.928, which is very close to the coefficient of correlation of deviations from trends.

In one of the previous examples, we reviewed the forecast for the volume of production in two months a certain company from Dublin. Evaluations were obtained for 1997, while a linear trend was used and the method of addition. Forecast values \u200b\u200bare given in tons

K values \u200b\u200bto assess the confidence intervals of the forecast relative to the linear trend with a probability of 0.8

Adaptive modeling of a linear trend with the help of exponential moving averages.

Algorithm for calculating the parameters of a linear trend

Calculate in the first approximation the parameters of the linear trend

Determine the final values \u200b\u200bof the parameters of the linear trend

EMA errors can worsen the quality of the forecast. In this case, when calculating the parameters of the linear trend, you must stop at step 2 of this algorithm.

LN - linear trend, seasonality is not taken into account

If we assume that price changes, contrary to the considerations of effectiveness on long periods of time, are determined by numerous and often nonlinear feedback, on the basis of the chaos theory, you can construct improved models that describe the influence of the past to the present (see -). Dramatic market collars in the absence of significant changes in information, sharp changes in the conditions of access and deadlines when crossing a company of some invisible threshold in the credit sphere - all this manifestations of nonlinearity. The real behavior of financial markets, rather, contradicts the rules for the appeal of linear trends than confirms them.

The method of serial differences is as follows if the range contains a linear trend, then the initial data is replaced by the first differences

Lou values \u200b\u200bdo not have a clear trend, they vary around the middle level, which means in a number of dynamics of a linear trend (linear trend). A similar conclusion can be done in a number of x absolute increments do not have a systematic orientation, they are approximately stable, and therefore a number is characterized by a linear trend.

This led to the idea of \u200b\u200bmeasuring the correlation not by the levels of X, IU and the first differences DC, \u003d X, -, 6U, - y, - y, .., (with linear trends). In general, it was recognized as necessary to correlate deviations from trends (minus the cyclic component) by the EU, -%, EX \u003d X, -%, (y,% - temporary series trends).

On the graph fig. 5.3 Vividly visible the presence of an increasing trend. The existence of a linear trend is possible.

The parameters of the linear trend can be interpreted so a - the initial level of the time series at the time T \u003d 0 B is the average for the period absolute increase in row levels. In relation to this temporary series, it can be said that the growth rate of the rated monthly wage for the 10 months of 1999 was changed from the level of 82.66% with an absolute increase over the month, equally, equal to 4.72 percent. Point. The estimated values \u200b\u200bof the time series are determined by the linear trend are defined in two ways. First, it is possible to consistently substitute in the value of the trend equation / \u003d 1, 2, ..., l, i.e.

Secondly, in accordance with the interpretation of the parameters of the linear trend, each subsequent level of the series is the sum of the previous level and the average chain absolute increase, i.e.

Thus, the initial level of the series in accordance with the equation of an exponential trend is 83.96 (compare with the initial level of 82.66 in the linear trend), and the average chain growth ratio is 1.046. Consequently, it can be said that