Analytical smoothing of the time series. Trend equation

According to formula (9.29), the parameters of the linear trend are a = 1894/11 = 172.2 centners / ha; b= 486/110 = 4.418 c / ha. The linear trend equation is:

ŷ = 172,2 + 4,418t, where t = 0 in 1987 This means that the average actual and leveled level referred to the middle of the period, i.e. by 1991, equal to 172 c / ha per year

Parabolic trend parameters according to (9.23) are equal to b = 4,418; a = 177,75; c =-0.5571. The parabolic trend equation is у̃ = 177,75 + 4,418t - 0.5571t 2; t= 0 in 1991 This means that the absolute increase in yield slows down on average by 2 · 0.56 c / ha per year per year. The absolute growth itself is no longer a constant of the parabolic trend, but is the average value over the period. In the year taken as the starting point, i.e. 1991, the trend passes through a point with an ordinate of 77.75 c / ha; The free term of the parabolic trend is not the average level over the period. The exponential trend parameters are calculated by the formulas (9.32) and (9.33) ln a= 56.5658 / 11 = 5.1423; potentiating, we obtain a= 171.1; ln k= 2.853: 110 = 0.025936; potentiating, we obtain k = 1,02628.

The exponential trend equation is: y̅ = 171.1 1.02628 t.

This means that the average annual rate of fasting yield for the period was 102.63%. At the point taken to the origin, the trend passes the point with the ordinate of 171.1 c / ha.

The levels calculated by the trend equations are recorded in the last three columns of the table. 9.5. As can be seen from this data. the calculated values ​​of the levels for all three types of trends do not differ much, since both the acceleration of the parabola and the growth rate of the exponent are small. The parabola has a significant difference - the growth of levels has stopped since 1995, while with a linear trend, the levels continue to grow, and with an exponential trend, they remain accelerated. Therefore, for forecasts for the future, these three trends are unequal: when extrapolating the parabola for future years, the levels will sharply diverge from the straight line and exponential, which can be seen from Table. 9.6. This table shows a printout of the solution on a PC using the Statgraphics program of the same three trends. The difference between their free terms from the ones given above is explained by the fact that the program numbers the years not from the middle, but from the beginning, so the free terms of the trends refer to 1986, for which t = 0. The exponential equation on the printout is left in logarithmic form. The forecast is made for 5 years ahead, i.e. until 2001. When the origin of coordinates (time reference) changes in the parabola equation, the average absolute increase, the parameter b. since as a result of negative acceleration, the growth is constantly decreasing, and its maximum is at the beginning of the period. The parabola constant is only acceleration.


The "Data" line contains the levels of the original series; “Forecast summary” means summary data for forecasting purposes. In the next lines - equations of a straight line, parabola, exponent - in logarithmic form. The ME bar represents the average discrepancy between the levels of the original series and the levels of the trend (flattened). For a straight line and a parabola, this discrepancy is always zero. The exponential levels are on average 0.48852 lower than the levels of the original series. An exact match is possible if the true trend is exponential; in this case there is no coincidence, but the difference is small. The MAE column is the variance s 2 - a measure of the volatility of actual levels relative to the trend, as described in clause 9.7. Graph MAE - average linear deviation of levels from the trend in absolute value (see paragraph 5.8); graph MAPE - relative linear deviation in percent. Here they are given as indicators of the suitability of the selected trend type. The parabola has a smaller variance and a deviation modulus: it is for the period 1986 - 1996. closer to actual levels. But the choice of the type of trend cannot be reduced only to this criterion. In fact, the slowdown in growth is the result of a large negative deviation, i.e., a crop failure in 1996.

The second half of the table is a forecast of yield levels for three types of trends for years; t = 12, 13, 14, 15 and 16 from the origin (1986). The projected levels are exponentially higher up to year 16 than in a straight line. Parabola trend levels are decreasing, diverging more and more from other trends.

As you can see in the table. 9.4, when calculating the trend parameters, the levels of the original series enter with different weights - values t p and their squares. Therefore, the influence of level fluctuations on the trend parameters depends on which number of the year falls on a good or bad year. If a sharp deviation falls on a year with a zero number ( t i = 0), then it will not have any effect on the trend parameters, and if it hits the beginning and end of the series, it will have a strong impact. Consequently, a single analytical alignment does not completely free the trend parameters from the influence of fluctuations, and with strong fluctuations they can be greatly distorted, which in our example happened to the parabola. To further eliminate the distorting influence of fluctuations on the trend parameters, you should apply multiple sliding alignment method.

This technique consists in the fact that the trend parameters are not calculated immediately over the entire series, but sliding method, first for the first T periods of time or moments, then for the period from the 2nd to t + 1, from 3rd to (t + 2) level, etc. If the number of source levels of the series is NS, and the length of each sliding base for calculating the parameters is T, then the number of such sliding bases t or individual parameter values ​​that will be determined from them will be:

L = n + 1 - T.

The application of the sliding multiple alignment technique can be considered, as can be seen from the above calculations, only with sufficient a large number a number of levels, usually 15 or more. Consider this technique using the example of the data in Table. 9.4 - the dynamics of prices for non-fuel goods in developing countries, which again gives the reader the opportunity to participate in a small scientific research... Using the same example, we will continue the forecasting methodology in Section 9.10.

If we calculate the parameters in our series for 11-year periods (for 11 levels), then t= 17 + 1 - 11 = 7. The meaning of multiple sliding alignment is that with successive shifts of the base for calculating the parameters at its ends and in the middle there will be different levels with deviations from the trend of different sign and magnitude. Therefore, with some shifts of the base, the parameters will be overestimated, with others, they will be underestimated, and with the subsequent averaging of the parameter values ​​over all shifts of the calculation base, further mutual compensation of the distortions of the trend parameters by level fluctuations will occur.

Multiple sliding alignment not only allows one to obtain a more accurate and reliable estimate of the trend parameters, but also to control the correctness of the choice of the type of trend equation. If it turns out that the leading parameter of the trend, its constant when calculating on moving bases does not fluctuate randomly, but systematically changes its value in a significant way, then the trend type was chosen incorrectly, this parameter is not a constant.

As for the free term with multiple equalization, there is no need and, moreover, it is simply incorrect to calculate its value as the average over all base shifts, because with this method, individual levels of the original series would be included in the calculation of the average with different weights, and the sum of the aligned levels diverged would be with the sum of the members of the original series. The free term of the trend is the average value of the level for the period, provided that the time is counted from the middle of the period. When counting from the beginning, if the first level t i= 1, the free term will be: a 0 = y̅ - b((N-1) / 2). It is recommended to choose the length of the moving base for calculating the trend parameters at least 9-11 levels in order to sufficiently dampen the level fluctuations. If the original row is very long, the base can be up to 0.7 - 0.8 of its length. To eliminate the influence of long-period (cyclical) fluctuations on the trend parameters, the number of base shifts should be equal or multiple of the length of the cycle of fluctuations. Then the beginning and the end of the base will sequentially "run through" all phases of the cycle, and when the parameter is averaged over all shifts, its distortions from cyclic oscillations will be canceled out. Another way is to take the length of the sliding base equal to the cycle length, so that the beginning of the base and the end of the base always fall on the same phase of the oscillation cycle.

Since according to the table. 9.4, it has already been established that the trend has a linear form, we calculate the average annual absolute growth, i.e., the parameter b linear trend equations in a sliding way over 11-year bases (see Table 9.7). It also provides the calculation of the data necessary for the subsequent study of the oscillation in paragraph 9.7. Let us dwell in more detail on the technique of multiple alignment on sliding bases. Let's calculate the parameter b on all bases:


A number of. Trend equation.

Growth curves describing the patterns of development of phenomena in time are the result of analytical alignment of time series. Aligning a series with the help of certain functions (i.e., fitting them to the data) in most cases turns out to be a convenient means of describing empirical data. This tool, subject to a number of conditions, can be used for forecasting. The leveling process consists of the following main stages:

Selection of the type of curve, the shape of which corresponds to the nature of the change in the time series;

Determination of numerical values ​​(estimation) of curve parameters;

A posteriori quality control of the selected trend.

In modern PPP, all of the above stages are implemented simultaneously, as a rule, within the framework of one procedure.

Analytical smoothing using this or that function allows one to obtain equalized, or, as they are sometimes not quite rightly called, theoretical values ​​of the levels of the time series, that is, those levels that would be observed if the dynamics of the phenomenon completely coincided with the curve. The same function, with or without some adjustment, is used as a model for extrapolation (forecast).

The question of choosing the type of curve is the main one when aligning a series. All other things being equal, the error in solving this issue turns out to be more significant in its consequences (especially for forecasting) than the error associated with the statistical estimation of the parameters.

Since the trend form objectively exists, then when identifying it, one should proceed from the material nature of the phenomenon under study, investigating internal reasons its development, as well as external conditions and factors influencing it. Only after a deep meaningful analysis can one proceed to the use of special techniques developed by statistics.

A very common technique for identifying the shape of a trend is a graphical display of a time series. But at the same time, the influence of the subjective factor is great, even when displaying aligned levels.

The most reliable methods for selecting a trend equation are based on the properties of the various curves used in analytical alignment. This approach makes it possible to link the type of trend with certain qualitative properties of the development of the phenomenon. It seems to us that in most cases, a method that is based on comparing the characteristics of changes in the increments of the investigated time series with appropriate characteristics growth curves. For alignment, the curve is selected, the law of change in the increment of which is closest to the pattern of change in actual data.

Table 4 provides a list of the types of curves most commonly used in the analysis of economic series and indicates the corresponding "symptoms" by which it is possible to determine which type of curves is suitable for alignment.

When choosing the shape of the curve, one more circumstance must be borne in mind. The increase in the complexity of the curve in a number of cases can really increase the accuracy of describing the trend in the past, however, due to the fact that more complex curves contain more parameters and more high degrees independent variable, their confidence intervals will generally be significantly wider than that of simpler curves with the same lead period.

Table 4

The nature of the change in indicators based
on average increments for different types of curves

Index The nature of the change in indicators over time Curve type
Approximately the same Straight
Change linearly Parabola of the second degree
Change linearly Parabola of the third degree
Approximately the same Exhibitor
Change linearly Logarithmic parabola
Change linearly Modified exponent
Change linearly Gompertz curve

At present, when the use of special programs without much effort allows you to simultaneously construct several types of equations, formal statistical criteria are widely used to determine the best trend equation.

From what was said above, apparently, we can conclude that the choice of the shape of the curve for alignment is a problem that cannot be solved unambiguously, but comes down to obtaining a number of alternatives. The final choice cannot lie in the field of formal analysis, especially if it is supposed to use equalization not only to statistically describe the pattern of level behavior in the past, but also to extrapolate the found pattern to the future. At the same time, various statistical methods for processing observation data can be of significant benefit, at least with their help, it is possible to reject obviously unsuitable options and thereby significantly limit the field of choice.

Consider the most commonly used types of trend equations:

1. Linear trend form:

where is the level of the row obtained as a result of alignment in a straight line;

Initial trend level;

Average absolute growth; constant trend.

The linear form of the trend is characterized by the equality of the so-called first differences (absolute increments) and zero second differences, i.e., accelerations.

2. Parabolic (2nd degree polynomial) trend form:

For this type of curve, the second differences (acceleration) are constant, and the third differences are zero.

The parabolic form of the trend corresponds to the accelerated or slowed down change in the levels of the series with constant acceleration. If< 0 и >0, then the quadratic parabola has a maximum if> 0 and< 0 – минимум. Для отыскания экстремума первую производную параболы по t приравнивают 0 и решают уравнение относительно t.

3.Exponential trend form:

where is the trend constant; the average rate of change in the level of the series.

When> 1, this trend may reflect the tendency of an accelerated and more and more accelerating increase in the levels of the series. At< 1 – тенденцию постоянно, все более замедляющегося снижения уровней временного ряда.

4. Hyperbolic trend form (type 1):

This form of trend can display the trend of processes limited by the limit value of the level.

5. Logarithmic trend form:

where is the trend constant.

A logarithmic trend can be used to describe a tendency manifested in a slowdown in the growth of the levels of a number of dynamics in the absence of a maximum possible value. When t is large enough, the logarithmic curve becomes almost indistinguishable from a straight line.

6.Inverse logarithmic trend form:

7. Multiplicative (power) trend form:

8. Reverse (hyperbolic type 2) trend form:

9.Hyperbolic trend form of 3 types:

10.Polynomial of the 3rd degree:

For all nonlinear, relative to the initial variables of the models (regression equations), and there are most of them here, it is required to carry out auxiliary transformations presented in the table below.

Table 5

Models that reduce to a linear trend

Model The equation Transformation
Multiplicative (Power)
Hyperbolic type I
Hyperbolic type II
Hyperbolic type III
Logarithmic
Reverse logarithmic

In the formulas listed in the table, as in all formulas describing the trend model, there are coefficients of the equations.

However, in the practical use of linearization using the transformation of the studied variables, it should be borne in mind that the estimates of the parameters obtained by linearization with the help of M.N.K. (method least squares), minimize the sum of the squared deviations for the transformed rather than the original variables. Therefore, the estimates obtained using the linearization of the dependencies need to be refined.

To solve the set task of analytical smoothing of time series in the STATISTICA system, we need to create several new additional variables necessary to perform further work, as well as carry out some auxiliary operations to convert nonlinear trend models into linear ones.

So, we have to build a trend equation, which is essentially a regression equation, in which "time" acts as a factor. First of all, we will create a variable "T" containing the times of the fourth period. Since the fourth period includes 12 years, the variable "T" will consist of natural numbers from 1 to 12, corresponding to the months of the year.

In addition, to work with some trend models, we need a few more variables, the content of which can be understood from their designation. These are the variables obtained from the time series: "T ^ 2", "T ^ 3", "1 / T" and "ln T". And also the variables obtained from the initial data for the fourth period: "1 / Import4" and "ln Import4". You also need to create the same table for export. All this is proposed to be done on a new worksheet by copying the data for the 4th period there.

To do this, we will use the Workbook / Insert menu already known to us.

As a result, we get the following spreadsheets.

Rice. 38. Table with auxiliary variables for import

Rice. 39. Table with auxiliary variables for export

For analytical alignment of the dynamics series, we will use the Multiple Regression module in the Statistics menu. Let's consider an example of building a graphical image and determining the numerical parameters of a trend expressed by a linear relationship.

Rice. 40. Module Multiple Regression in the Statistics menu

To select dependent and independent variables, use the Variables button.

In the window that opens, in the left information field, we select the dependent variable Y t,(in our case, this is Import 4 - data for the fourth period). The selected dependent variable numbers are displayed at the bottom in the Dependent var field. (or list for batch). Accordingly, in the right field, we select independent variables (in our case, one - time "T"). The selected independent variable numbers are highlighted at the bottom in the Independent variable list field.

After the selection of variables is completed, click OK. The system displays a window with the generalized results of calculating the trend parameters (they will be discussed in more detail below) and the possibility of choosing a direction for further detailed analysis. Note that the score value highlighted in red indicates the statistical significance of the results.

Rice. 41. Advanced Tab

There are several buttons on the tab that allow you to get the most detailed information on the direction of analysis that interests us. When you click on it, we get two tables with the results of the regression analysis. The first presents the results of calculating the parameters of the regression equation, the second - the main indicators of the equation.

Rice. 42. Key indicators of the equation for import data for the fourth period (linear trend)

Here N = Is the volume of the resulting variable. The upper field contains indicators R,, Adjusted R, F, p, Std.Error of Estimate , meaning, respectively, the theoretical correlation ratio, the coefficient of determination, the refined coefficient of determination, the calculated value of the Fisher criterion (the number of degrees of freedom is given in parentheses), the level of significance, the standard error of the equation (the same indicators can be seen in the second table). In the table itself, we are interested in the column V , in which the coefficients of the equation are located, the column t and column p-level , denoting the calculated value of the t-criterion and the calculated level of significance required to assess the significance of the parameters of the equation. At the same time, the system helps the user: when the procedure involves a test for significance, STATISTICA highlights significant elements in red (i.e., the null hypothesis that the parameters are equal to zero is rejected). In our case | t fact | > t table for both parameters, therefore they are significant.

Rice. 43. Parameters of the regression equation for import data for the fourth period (linear trend)

To assess the statistical significance of the equation as a whole, on the Advanced tab, use the ANOVA (Goodness Of Fit) button, which allows you to get the ANOVA table and the Fisher's F-test value.

Rice. 44. ANOVA table

Sums of Squares - sum of squares of deviations: at the intersection with the line Regression - the sum of the squares of the deviations of the theoretical (obtained by the regression equation) values ​​of the feature from the average. This sum of squares is used to calculate the factorial, explained variance of the dependent variable. At the intersection with the string Residual - the sum of the squares of the deviations of the theoretical and actual values ​​of the variable (to calculate the residual, unexplained variance), Total - deviations of the actual values ​​of the variable from the average (for calculating total variance). Column df - the number of degrees of freedom, Means Squares denotes variance: at the intersection with the string Regression- factorial, with a string Residual - residual, F - Fisher's criterion, used to assess the overall significance of the equation and the coefficient of determination, p-level - significance level.

The parameters of the trend equation in STATISTICA, as in most other programs, are calculated using the method of least squares (OLS).

The method allows one to obtain the values ​​of the parameters at which the sum of the squares of the deviations of the actual levels from the smoothed ones, i.e., obtained as a result of analytical alignment, is minimized.

The mathematical apparatus of the least squares method is described in most works on mathematical statistics, so there is no need to dwell on it in detail. Let us recall only a few points. So, to find the parameters of the linear trend (2.10), it is necessary to solve the system of equations:

This system of equations is simplified if the values t select in such a way that their sum is equal to zero, that is, the origin of the countdown should be transferred to the middle of the period under consideration. Obviously, the transfer of the origin of coordinates makes sense only for manual processing of time series.

If, then,.

In general form, the system of equations for finding the parameters of the polynomial can be written as

When smoothing a time series exponentially (which is often used in economic research) to determine the parameters, you should apply the least squares method to the logarithms of the original data.

After transferring the start of the countdown to the middle of the row, get:

hence:

If more complex changes in the levels of the time series are observed and the alignment is carried out according to the exponential function of the form, then the parameters are determined as a result of solving the following system of equations:

In the practice of researching socio-economic phenomena, time series are extremely rare, the characteristics of which fully correspond to the characteristics of the reference mathematical functions. This is due to a significant number of factors of a different nature that affect the levels of the series and the tendency of their change.

In practice, most often they build whole line functions that describe the trend, and then choose the best one based on one or another formal criterion.

Rice. 45. Residuals / Assumptions / Prediction tab

Here we will use the Perform Residual Analysis button, which opens the residual analysis module. Residuals in this case mean the deviation of the initial values ​​of the time series from the predicted ones, in accordance with the chosen trend equation. Go straight to the Advanced tab.

Rice. 46. ​​Advanced Tab in Perform Residual Analysis

Let's use the Summary: Residuals & Predicted button, which allows us to get the table of the same name, which contains the initial values ​​of the Observed Value dynamic series, the predicted values ​​for the selected Predicted Value trend model, deviations of the predicted values ​​from the original Residual Value, as well as various special indicators and standardized values. The table also shows the maximum, minimum, average and median values ​​for each column.

Rice. 47. Table containing indicators and special values ​​for a linear trend

In this table, of greatest interest to us is the Residual Value column, the values ​​of which are further used to characterize the quality of trend selection, as well as the Predicted Value column, which contains the predicted values ​​of the time series in accordance with the selected trend model (in our case, linear).

Next, let's build a graph of the initial time series together with the predicted values ​​for the fourth period calculated in accordance with the linear trend equation. The best way to do this is to copy the values ​​from the Predicted Value column into the table in which the trend variables were created.

Rice. 48. The third period of the time series of imports (billions of dollars) and a linear trend

So, we received all the necessary results of calculating the parameters of the trend expressed by a linear model for the fourth period of the original time series, and also built a graph this series aligned with the trend line. The rest of the trend models will be presented below.

It should be noted that as a result of linearization of power and exponential functions, STATISTICA returns the value of the linearized function equal, therefore, for further use, they must be transformed using the following elementary transaction, including for the construction of graphical images. For hyperbolic functions, as well as for the inverse logarithmic function, it is necessary to perform a transformation of the form.

To do this, it is also advisable to create additional variables and get them using formulas based on existing variables.

So, when solving a problem using the Multiple Regression procedure, you need to select as variables natural logarithms the original series and the time axis.

Rice. 49. Key indicators of the equation for import data for the third period (power model)

Rice. 50. Parameters of the regression equation for import data for the third period (power model)

Rice. 51. ANOVA table

Rice. 52. Table containing indicators and special values ​​for the power model

Then, as in the case of a linear trend, we copy the values ​​from the Predicted Value column to the table, but for this we build another variable in which we get the predicted values ​​for power function using transformation.

Rice. 53. Create an additional variable

Rice. 54. Table with all variables

Rice. 55. The third period of the time series of imports (billion $) and the power model

Fig. 56. Key Equation Indicators for Third Period Import Data (Exponential Model)

Rice. 57. The third period of the time series of imports (billions of dollars) and the exponential model

Fig. 58. Key Equation Indicators for Third Period Import Data (Reverse Model)

Rice. 59. The third period of the time series of imports (billions of dollars) and the inverse model

Rice. 60. Key indicators of the equation for import data for the third period (polynomial of the second degree)

Rice. 61. The third period of the dynamic series of imports (billions of dollars) and a polynomial of the second degree

Rice. 62. Key indicators of the equation for import data for the third period (3rd degree polynomial)

Rice. 63. The third period of the time series import (billion $) and a polynomial of the 3rd degree


Rice. 64. The main indicators of the equation for import data for the third period (hyperbola of the 1st type)

Rice. 65. The third period of the time series import (billion $) and hyperbole of the 1st type


Rice. 66. The main indicators of the equation for import data for the third period (type 3 hyperbole)

Rice. 67. The third period of the time series import and type 3 hyperbole


Rice. 68. Key indicators of the equation for import data for the third period (logarithmic model)

Rice. 69. The third period of the time series import (billion $) and the logarithmic model


Rice. 70. Key indicators of the equation for import data for the third period (inverse logarithmic model)

Rice. 71. The third period of the time series import (billion $) and the inverse logarithmic model


Then we will build a table with auxiliary variables for building trends for export.

Rice. 72. Table with auxiliary variables

Let's do the same operations as for the fourth import period.

Rice. 73. Key indicators of the equation for export data for the third period (linear model)

Rice. 74. The third period of the time series of exports (billions of dollars) and the linear model

Rice. 75. Key indicators of the equation for export data for the third period (power trend model)

Rice. 76. The third period of the dynamic series of exports and the power model


Rice. 77. Key Equation Indicators for Third Period Export Data (Exponential Trend Model)

Rice. 78. The third period of the time series of exports (billions of dollars) and the exponential model


Rice. 79. Key Equation Indicators for Third Period Export Data (Reverse Trend Model)

Rice. 80. The third period of the time series of exports (billions of US dollars) and the inverse model


Rice. 81. Key indicators of the equation for export data for the third period (polynomial of the second degree)

Rice. 82. The third period of the time series of exports (billion $) and the second degree polynomial


Rice. 83. Key indicators of the equation for export data for the third period (third degree polynomial)

Rice. 84. The third period of the time series of exports (billion $) and the third degree polynomial


Rice. 85. The main indicators of the equation for export data for the third period (hyperbola of the 1st type)

Rice. 86. The third period of the dynamic series of exports and type 1 hyperbole


Rice. 87. The main indicators of the equation for export data for the third period (hyperbola of the 3rd type)

Rice. 88. The third period of the dynamic series of exports (billion $) and hyperbole of the 3rd type


Rice. 89. Key indicators of the equation for export data for the third period (logarithmic model)

Rice. 90. The third period of the time series of exports (billion $) and the logarithmic model


Rice. 91. Key indicators of the equation for export data for the third period (inverse logarithmic model)

Rice. 91. The third period of the time series of exports (billion $) and the inverse logarithmic model


Choosing the best trend

As already noted, the problem of choosing the shape of the curve is one of the main problems encountered when aligning a series of dynamics. The solution to this problem largely determines the results of trend extrapolation. Most specialized programs provide the opportunity to use the following criteria to select the best trend equation:

The minimum value of the mean square error of the trend:

,

where are the actual levels of a number of dynamics;

Series levels determined by the trend equation;

n - number of levels in a row;

p - the number of factors in the trend equation.

- minimum value of residual variance:

The minimum value of the average approximation error;

The minimum value of the average absolute error;

Maximum value determination coefficient;

Maximum value of F-Fisher's criterion:

: ,

where k- the number of degrees of freedom of factorial variance, equal to the number of independent variables (attributes-factors) in the equation;

n-k-1- the number of degrees of freedom of the residual dispersion.

The application of a formal criterion for choosing the shape of the curve is likely to give practically suitable results if the selection is carried out in two stages. At the first stage, dependencies are selected that are suitable from the point of view of a meaningful approach to the problem, as a result of which the range of potentially acceptable functions is limited. At the second stage, for these functions, the values ​​of the criterion are calculated and the one from the curves is selected, which corresponds to its minimum value.

In this tutorial, a formal method is used to identify a trend, which is based on the use of a numerical criterion. The maximum coefficient of determination is considered as such a criterion:

.

The interpretation of the designations and formulas of these indicators are given in the previous sections. The coefficient of determination shows what proportion of the total variance of the effective trait is due to the variation of the trait - factor. In STATISTICA tables, it is denoted as R ?.

The following table will present the trend model equations and determination coefficients for import data.

Table 6

Trend model equations and import determination coefficients.

Comparing the values ​​of the coefficients of determination for different types curves, it can be concluded that for the third period under study, the best form of the trend will be a polynomial of the third degree for imports and exports.

Next, it is necessary to analyze the selected trend model from the point of view of its adequacy to the real trends of the investigated time series through the assessment of the reliability of the obtained trend equations by Fisher's F-criterion. In this case, the calculated value of the Fisher criterion for imports is 16.573; for export - 13,098, and table value at a significance level of 3.07. Consequently, this trend model is recognized as adequately reflecting the real trend of the phenomenon under study.

The three previous notes describe regression models that predict the response from the values ​​of explanatory variables. In this post, we show you how to use these models and other statistical techniques to analyze data collected over successive time intervals. Based on the specifics of each company mentioned in the scenario, we will consider three alternative approaches to time series analysis.

The material will be illustrated with a cross-cutting example: forecasting the income of three companies... Imagine that you are an analyst at a large financial company. To assess the investment prospects of your clients, you need to predict the earnings of three companies. To do this, you collected data on three companies of interest to you - Eastman Kodak, Cabot Corporation, and Wal-Mart. Since companies differ in the type of business activity, each time series has its own unique characteristics. Therefore, for forecasting it is necessary to apply different models... How to choose the best forecasting model for each company? How to evaluate investment prospects based on forecast results?

The discussion begins with an analysis of the annual data. Two methods of smoothing such data are demonstrated: moving average and exponential smoothing. It then demonstrates the procedure for calculating the trend using the least squares method and more sophisticated forecasting methods. Finally, these models are extended to time series based on monthly or quarterly data.

Download the note in the format or, examples in the format

Forecasting in business

As economic conditions change over time, managers must anticipate the impact these changes will have on their company. One of the methods to ensure accurate planning is forecasting. Despite the large number of developed methods, they all pursue the same goal - to predict events that will occur in the future in order to take them into account when developing plans and strategies for the company's development.

Modern society is constantly in need of forecasting. For example, in order to develop the right policy, members of the government must predict the levels of unemployment, inflation, industrial production, income tax individuals and corporations. To determine the equipment and staffing needs, airline directors must correctly predict the volume of air travel. In order to create enough dorm spaces, college or university administrators want to know how many students will enroll in their dormitory. educational institution next year.

There are two generally accepted approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are especially important when quantitative data are not available to the researcher. Typically, these methods are highly subjective. If statistics are available about the history of the object of study, quantitative forecasting methods should be used. These methods allow predicting the state of an object in the future based on data about its past. Quantitative forecasting methods fall into two categories: time series analysis and causal analysis methods.

Time series is a set of numeric data acquired over consecutive periods of time. Time series analysis allows you to predict the value of a numeric variable based on its past and present values. For example, daily stock prices on the New York Stock Exchange form a time series. Another example of a time series is the monthly CPI, quarterly gross domestic product, and annual sales revenue of a company.

Methods for the analysis of causal relationships allow you to determine what factors affect the values ​​of the predicted variable. These include methods of multiple regression analysis with lagging variables, econometric modeling, analysis of leading indicators, methods for analyzing diffusion indices and other economic indicators. We will only talk about forecasting methods based on time analysis. NS x rows.

Components of the classical multiplicative temporal model NS x rows

The main assumption underlying the analysis of time series is the following: the factors influencing the studied object in the present and the past will influence it in the future. Thus, the main goals of time series analysis are to identify and highlight factors that are important for forecasting. To achieve this goal, many mathematical models have been developed designed to investigate the fluctuations of the components included in the time series model. Probably the most common is the classic multiplicative model for annual, quarterly, and monthly data. To demonstrate the classic multiplicative time series model, consider the actual earnings data of Wm.Wrigley Jr. Company for the period from 1982 to 2001 (Fig. 1).

Rice. 1. A graph of the actual gross income of Wm.Wrigley Jr. Company (USD million in current prices) for the period from 1982 to 2001

As you can see, over the past 20 years, the company's actual gross income has been on an upward trend. This long-term trend is called a trend. Trend is not the only component of the time series. Besides it, the data has cyclical and irregular components. Cyclical component describes fluctuations in data up and down, often correlating with business cycles. Its length varies from 2 to 10 years. The intensity, or amplitude, of the cyclic component is also not constant. In some years, the data may be higher than the value predicted by the trend (i.e., be in the vicinity of the peak of the cycle), and in other years - lower (i.e., be at the bottom of the cycle). Any observable data that does not lie on the trend curve and does not obey a cyclical relationship is called irregular or random components... If data is recorded daily or quarterly, an additional component arises called seasonal... All components of time series typical for economic applications are shown in Fig. 2.

Rice. 2. Factors influencing the time series

The classical multiplicative time series model states that any observed value is the product of the listed components. If the data are annual, observation Yi corresponding i-th year, is expressed by the equation:

(1) Y i = T i* C i* I i

where T i- trend value, C i i-th year, I i i th year.

If the data is measured monthly or quarterly, observation Y i corresponding to the i-th period is expressed by the equation:

(2) Y i = T i * S i * C i * I i

where T i- trend value, S i is the value of the seasonal component in i-th period, C i is the value of the cyclic component in i-th period, I i is the value of the random component in i-th period.

At the first stage of time series analysis, the data is plotted and their dependence on time is revealed. First, you need to find out if there is a long-term increase or decrease in the data (i.e. trend), or the time series fluctuates around a horizontal line. If there is no trend, then moving average or exponential smoothing can be used to smooth the data.

Smoothing annual time series

We mentioned Cabot Corporation in the script. Headquartered in Boston, Massachusetts, it specializes in the manufacture and sale of chemicals, building materials, products of fine chemicals, semiconductors and liquefied natural gas... The company has 39 factories in 23 countries. The company has a market value of approximately $ 1.87 billion. Its shares are listed on the New York Stock Exchange under the acronym CBT. The company's revenues for the specified period are shown in Fig. 3.

Rice. 3. Revenues of Cabot Corporation in 1982-2001 (billions of dollars)

As you can see, the long-term upward trend in income is dimmed. big amount hesitation. Thus, visual analysis of the chart does not allow us to assert that the data is trending. Moving average or exponential smoothing techniques can be used in such situations.

Moving averages. The moving average method is very subjective and depends on the length of the period L selected for calculating averages. In order to exclude cyclical fluctuations, the length of the period must be an integer multiple of medium length cycle. Moving averages for a selected period of length L, form a sequence of means calculated for sequences of length L... Moving averages are indicated by the symbols MA (L).

Suppose we want to compute five-year moving averages from data measured over the course of n= 11 years old. Insofar as L= 5, five-year moving averages form a sequence of averages calculated from five consecutive values ​​of the time series. The first of the five-year moving averages is calculated by summing the first five years and then dividing by five:

The second five-year moving average is calculated by summing the data for years 2 through 6 and then dividing by five:

This process continues until a moving average is calculated for the last five years. Working with annual data, one should assume the number L(the length of the period selected for calculating the moving averages) is odd. In this case, it is impossible to calculate the moving averages for the first ( L- 1) / 2 and last ( L- 1) / 2 years. Therefore, when working with five-year moving averages, it is not possible to perform calculations for the first two and the last two years. The year for which the moving average is calculated must be in the middle of a period of length L... If n= 11, a L= 5, the first moving average must correspond to the third year, the second to the fourth, and the last to the ninth. In fig. 4 shows graphs of 3- and 7-year moving averages computed for Cabot Corporation earnings from 1982 to 2001.

Rice. 4. Graphs of 3- and 7-year moving averages, calculated for the income of the company Cabot Corporation

Note that the observed values ​​for the first and last years are ignored when calculating the 3-year moving averages. Similarly, when calculating seven-year moving averages, there are no results for the first and last three years. In addition, 7-year moving averages smooth the time series much more than 3-year ones. This is because the 7-year moving average has a longer period. Unfortunately than longer length period, the smaller the number of moving averages can be calculated and presented on the chart. Therefore, it is undesirable to choose more than seven years for calculating moving averages, since too many points will fall out from the beginning and end of the chart, which will distort the shape of the time series.

Exponential smoothing. To identify long-term trends that characterize data changes, except for moving averages, the method of exponential smoothing is used. This method also allows you to make short-term forecasts (within one period), when the existence of long-term trends is questionable. Due to this, the exponential smoothing method has a significant advantage over the moving average method.

The exponential smoothing method gets its name from a sequence of exponentially weighted moving averages. Each value in this sequence is dependent on all previous observed values. Another advantage of the exponential smoothing method over the moving average method is that when using the latter, some values ​​are discarded. With exponential smoothing, the weights assigned to the observed values ​​decrease over time, so after the calculation is performed, the most common values ​​are weighted the most, and the infrequent values ​​the least. Despite the enormous amount of calculations, Excel allows you to implement the exponential smoothing method.

An equation that flattens a time series over an arbitrary period of time i, contains three members: the current observed value Yi, belonging to the time series, the previous exponentially smoothed value Ei –1 and assigned weight W.

(3) E 1 = Y 1 E i = WY i + (1 - W) E i – 1, i = 2, 3, 4,…

where Ei- the value of the exponentially smoothed series, calculated for i-th period, E i –1 Is the value of an exponentially smoothed series calculated for ( i- 1) -go period, Y i- the observed value of the time series in i-th period, W- subjective weight, or smoothing coefficient (0< W < 1).

The choice of the smoothing factor, or weight assigned to the members of the series, is crucial because it directly affects the result. Unfortunately, this choice is somewhat subjective. If the researcher just wants to exclude unwanted cyclical or random fluctuations from the time series, small values ​​should be chosen W(close to zero). On the other hand, if a time series is used for forecasting, a large weight must be chosen W(close to one). In the first case, long-term trends in the time series are clearly manifested. In the second case, the accuracy of short-term forecasting increases (Fig. 5).

Rice. 5 Plots of exponentially smoothed time series (W = 0.50 and W = 0.25) for Cabot Corporation earnings data from 1982 to 2001; calculation formulas see Excel file

Exponentially smoothed value obtained for i-th time interval can be used as an estimate of the predicted value in ( i+1) -th interval:

To predict the revenues of Cabot Corporation in 2002 based on an exponentially smoothed time series corresponding to the weight W= 0.25, the smoothed value calculated for 2001 can be used. From fig. Figure 5 shows that this value is equal to $ 1,651.0 million.When the data on the company's revenues in 2002 becomes available, we can apply equation (3) and predict the level of revenues in 2003 using the smoothed value of revenues in 2002:

Analysis package Excel is able to plot exponential smoothing in one click. Go through the menu DataData analysis and select the option Exponential smoothing(fig. 6). In the opened window Exponential smoothing set the parameters. Unfortunately, the procedure allows you to build only one smoothed row, so if you want to play with the parameter W, repeat the procedure.

Rice. 6. Plotting exponential smoothing using the Analysis Package

Least Squares Trending and Forecasting

Among the components of the time series, the trend is most often studied. It is the trend that allows you to make short-term and long-term forecasts. To identify a long-term trend in a time series, a graph is usually drawn on which the observed data (values ​​of the dependent variable) are plotted on the vertical axis, and time intervals (values ​​of the independent variable) are plotted on the horizontal axis. In this section, we describe the procedure for detecting linear, quadratic and exponential trends using the least squares method.

Linear trend model is the simplest model used for forecasting: Y i = β 0 + β 1 X i + ε i. Linear trend equation:

For a given significance level α, the null hypothesis is rejected if the test t-statistics more than the upper or less than the lower critical level t-distributions. In other words, the decision rule is formulated as follows: if t > tU or t < t L, null hypothesis H 0 is rejected, otherwise the null hypothesis is not rejected (Fig. 14).

Rice. 14. Areas of rejection of the hypothesis for the two-sided test of the significance of the autoregressive parameter A r having the highest order

If the null hypothesis ( A r= 0) does not reject, which means that the selected model contains too many parameters. The criterion allows you to discard the senior member of the model and evaluate the autoregressive model of the order p – 1... This procedure should be continued until the null hypothesis H 0 will not be rejected.

  1. Select order R the estimated autoregressive model, taking into account that t- the criterion of significance has n–2p – 1 degrees of freedom.
  2. Form a sequence of variables R"Lagging" so that the first variable lags one time interval, the second two, and so on. The last value should be delayed by R time intervals (see Fig. 15).
  3. Apply Analysis package Excel to compute a regression model containing all R lagged time series values.
  4. Estimate the significance of the parameter A R of the highest order: a) if the null hypothesis is rejected, all R parameters; b) if the null hypothesis is not rejected, discard R-th variable and repeat steps 3 and 4 for a new model that includes p – 1 parameter. The validation of the significance of the new model is based on t-criteria, the number of degrees of freedom is determined by the new number of parameters.
  5. Repeat steps 3 and 4 until the senior term of the autoregressive model becomes statistically significant.

To demonstrate the autoregressive modeling, let us return to the time series analysis of the real earnings of the company Wm. Wrigley Jr. In fig. 15 shows the data required to build autoregressive models of the first, second and third order. All columns of this table are required to build a third-order model. When building a second-order autoregressive model, the last column is ignored. When building a first-order autoregressive model, the last two columns are ignored. Thus, when constructing autoregressive models of the first, second and third order, one, two, and three, respectively, are excluded from 20 variables.

Choosing the most accurate autoregressive model starts with a third-order model. For correct work Analysis package followed as input interval Y specify the range B5: B21, and the input interval for NS- C5: E21. The analysis data are shown in Fig. 16.

Check the significance of the parameter A 3 having the highest order. His assessment a 3 is –0.006 (cell C20 in Figure 16), and the standard error is 0.326 (cell D20). To test the hypotheses H 0: A 3 = 0 and H 1: A 3 ≠ 0, we calculate t-statistics:

t-criteria with n – 2p – 1 = 20–2 * 3–1 = 13 degrees of freedom are equal: t L= STUDENT.OBR (0.025; 13) = –2.160; t U= STUDENT.OBR (0.975,13) = +2.160. Since -2.160< t = –0,019 < +2,160 и R= 0.985> α = 0.05, null hypothesis H 0 cannot be rejected. Thus, the third order parameter has no statistical significance in the autoregressive model and should be removed.

Let's repeat the analysis for the second-order autoregressive model (Fig. 17). Estimate of the parameter having the highest order, a 2= –0.205, and its standard error is 0.276. To test the hypotheses H 0: A 2 = 0 and H 1: A 2 ≠ 0, we calculate t-statistics:

At the significance level α = 0.05, the critical values ​​of the two-sided t-criteria with n – 2p – 1 = 20–2 * 2–1 = 15 degrees of freedom are equal: t L= STUDENT.OBR (0.025; 15) = –2.131; t U= STUDENT.OBR (0.975; 15) = +2.131. Since –2.131< t = –0,744 < –2,131 и R= 0.469> α = 0.05, null hypothesis H 0 cannot be rejected. Thus, the second order parameter is not statistically significant and should be removed from the model.

Let's repeat the analysis for the first-order autoregressive model (Fig. 18). Estimate of the parameter having the highest order, a 1= 1.024, and its standard error is 0.039. To test the hypotheses H 0: A 1 = 0 and H 1: A 1 ≠ 0, we calculate t-statistics:

At the significance level α = 0.05, the critical values ​​of the two-sided t-criteria with n – 2p – 1 = 20–2 * 1–1 = 17 degrees of freedom are equal: t L= STUDENT.OBR (0.025; 17) = –2.110; t U= STUDENT.OBR (0.975; 17) = +2.110. Since -2.110< t = 26,393 < –2,110 и R = 0,000 < α = 0,05, нулевую гипотезу H 0 should be rejected. Thus, the first order parameter is statistically significant and cannot be removed from the model. So, the first-order autoregressive model is the best approximation of the original data. Using estimates a 0 = 18,261, a 1= 1.024 and the value of the time series for the last year - Y 20 = 1 371.88, one can predict the value of the company's real income Wm. Wrigley Jr. Company in 2002:

Choosing an adequate forecasting model

Six methods for predicting time series values ​​have been described above: linear, quadratic, and exponential trend models, and first, second, and third order autoregressive models. Is there an optimal model? Which of the six described models should be used to predict the value of the time series? Listed below are four principles that should guide the selection of an adequate forecasting model. These principles are based on estimates of the accuracy of the models. It is assumed that the values ​​of the time series can be predicted by studying its previous values.

Principles for choosing models for forecasting:

  • Perform residue analysis.
  • Estimate the residual error using the squared differences.
  • Estimate the residual error using absolute differences.
  • Be guided by the principle of economy.

Residual analysis. Recall that the residual is the difference between the predicted value and the observed value. Having built a model for the time series, one should calculate the residuals for each of n intervals. As shown in fig. 19, panel A, if the model is adequate, the residuals represent a random component of the time series and are therefore irregularly distributed. On the other hand, as shown in the remaining panels, if the model is not adequate, the residuals may have a systematic relationship that does not take into account either the trend (panel B), or cyclical (panel C), or the seasonal component (panel D).

Rice. 19. Analysis of residuals

Measurement of absolute and root-mean-square residual errors. If the analysis of residuals does not allow to determine a single adequate model, you can use other methods based on an estimate of the magnitude of the residual error. Unfortunately, there is no consensus among statisticians on the best estimate of the residual errors of the models used for forecasting. Based on the principle of least squares, you can first perform a regression analysis and calculate the standard error of the estimate S XY... When analyzing a specific model, this value is the sum of the squares of the differences between the actual and predicted values ​​of the time series. If the model perfectly fits the time series values ​​at previous points in time, the standard error of the estimate is zero. On the other hand, if the model poorly approximates the values ​​of the time series at previous points in time, the standard error of the estimate is large. Thus, by analyzing the adequacy of several models, one can choose a model that has a minimum standard error of the S XY estimate.

The main disadvantage of this approach is the exaggeration of errors in predicting individual values. In other words, any large difference between the quantities Yi and Ŷ i when calculating the sum of squared errors, the SSE is squared, i.e. increases. For this reason, many statisticians prefer to use mean absolute deviation (MAD) to assess the adequacy of a forecasting model:

When analyzing specific models, the MAD value is the average value of the absolute values ​​of the differences between the actual and predicted values ​​of the time series. If the model perfectly fits the time series values ​​at previous points in time, the mean absolute deviation is zero. On the other hand, if the model poorly approximates such time series values, the mean absolute deviation is large. Thus, by analyzing the adequacy of several models, it is possible to select a model that has the minimum mean absolute deviation.

The principle of economy. If the analysis of the standard errors of the estimates and the mean absolute deviations does not allow determining the optimal model, you can use the fourth method, based on the principle of economy. This principle states that the simplest should be chosen from among several equal models.

Of the six forecasting models discussed in this chapter, the simplest are the linear and quadratic regression models, as well as the first-order autoregressive model. The rest of the models are much more complex.

Comparison of four forecasting methods. To illustrate the selection process optimal model Let's return to the time series, consisting of the values ​​of the real income of the company Wm. Wrigley Jr. Company. Let's compare four models: linear, quadratic, exponential, and first-order autoregressive models. (Second- and third-order autoregressive models only slightly improve the forecasting accuracy of the values ​​of a given time series, so they can be ignored.) 20 shows the plots of residuals constructed by analyzing four prediction methods using Analysis package Excel. Be careful when drawing conclusions from these graphs, as the time series contains only 20 points. For construction methods, see the corresponding sheet in the Excel file.

Rice. 20. Plots of residuals, constructed in the analysis of four forecasting methods using Analysis package Excel

No model other than the first-order autoregressive model takes into account the cyclical component. It is this model that approximates observations better than others and is characterized by the least systematic structure. So, the analysis of the residuals of all four methods showed that the first-order autoregressive model is the best, and the linear, quadratic and exponential models have lower accuracy. To be convinced of this, let us compare the values ​​of the residual errors of these methods (Fig. 21). You can familiarize yourself with the calculation method by opening the Excel file. In fig. 21 are actual values Y i(speaker Real income), predicted values Ŷ i as well as leftovers ei for each of the four models. In addition, the values ​​shown are SYX and MAD... For all four models, the quantities SYX and MAD about the same. The exponential model is relatively inferior, while the linear and quadratic models are superior in accuracy. As expected, the smallest values SYX and MAD has a first order autoregressive model.

Rice. 21. Comparison of four forecasting methods using indicators S YX and MAD

Having chosen a specific forecasting model, it is necessary to closely monitor further changes in the time series. Among other things, such a model is created in order to correctly predict the values ​​of the time series in the future. Unfortunately, such forecasting models poorly account for changes in the structure of the time series. It is absolutely necessary to compare not only the residual error, but also the accuracy of forecasting future values ​​of the time series obtained with the help of other models. Having measured the new value Yi in the observed time interval, it must be immediately compared with the predicted value. If the difference is too large, the forecasting model should be revised.

Forecasting time NS x series based on seasonal data

So far, we have studied a time series composed of annual data. However, many time series are composed of quantities measured quarterly, monthly, weekly, daily, and even hourly. As shown in fig. 2, if the data are measured monthly or quarterly, the seasonal component should be considered. In this section, we will look at methods to predict the values ​​of such time series.

The scenario at the beginning of the chapter referred to Wal-Mart Stores, Inc. The company has a market capitalization of $ 229 billion. Its shares are listed on the New York Stock Exchange under the acronym WMT. The company's financial year ends on January 31st, so the fourth quarter of 2002 includes November and December 2001 and January 2002. The time series of the company's quarterly revenues is shown in Fig. 22.

Rice. 22. Wal-Mart Stores, Inc. Quarterly Revenues (USD million)

For quarter series such as this, the classic multiplicative model, in addition to trend, cyclical and random components, contains a seasonal component: Y i = T i* S i* C i* I i

Predicting menstruation and time NS x rows using the least squares method. The regression model with a seasonal component is based on a combined approach. To calculate the trend, the least squares method described earlier is used, and to take into account the seasonal component - the categorical variable (for more details, see the section Dummy Regression Models and Interaction Effects). An exponential model is used to approximate the time series with seasonal components. In a model approximating a quarterly time series, we needed three dummy variables to account for four quarters Q 1, Q 2 and Q 3, and in the model for a monthly time series, 12 months are represented using 11 dummy variables. Since these models use the variable log Y i, but not Y i, to calculate the true regression coefficients, the inverse transformation must be performed.

To illustrate the process of building a quarterly time series model, let's return to Wal-Mart's earnings. The exponential model parameters obtained using Analysis package Excel are shown in Fig. 23.

Rice. 23. Regression analysis quarterly revenue of Wal-Mart Stores, Inc.

It can be seen that the exponential model approximates the original data quite well. Mixed correlation coefficient r 2 is 99.4% (cells J5), the corrected coefficient of mixed correlation is 99.3% (cells J6), test F-statistics - 1,333.51 (cells M12), and R-value is 0.0000. At a significance level of α = 0.05, each regression coefficient in a classical multiplicative time series model is statistically significant. Applying the operation of potentiation to them, we obtain the following parameters:

Odds are interpreted as follows.

Using regression coefficients b i, you can predict the income received by the company in a specific quarter. For example, let's predict the company's revenue for the fourth quarter of 2002 ( Xi = 35):

log = b 0 + b 1 NSi = 4,265 + 0,016*35 = 4,825

= 10 4,825 = 66 834

Thus, according to the forecast in the fourth quarter of 2002, the company should have received revenue equal to $ 67 billion (hardly a forecast should be made to the nearest million). In order to extend the forecast for a period of time outside the time series, for example, for the first quarter of 2003 ( Xi = 36, Q 1= 1), you need to perform the following calculations:

log Ŷ i = b 0 + b 1NSi + b 2 Q 1 = 4,265 + 0,016*36 – 0,093*1 = 4,748

10 4,748 = 55 976

Indexes

Indices are used as indicators that respond to changes in the economic situation or business activity. There are many types of indices, such as price indices, quantitative indices, value indices, and sociological indices. In this section, we will only consider the price index. Index- the value of some economic indicator (or a group of indicators) at a specific point in time, expressed as a percentage of its value at the base point in time.

Price index. A simple price index reflects the percentage change in the price of a product (or group of products) over a given period of time compared to the price of that product (or group of products) at a particular point in time in the past. When calculating the price index, first of all, you should select a base time interval - a time interval in the past with which comparisons will be made. When choosing a base period for a specific index, periods of economic stability are preferable to periods of economic recovery or recession. In addition, the baseline should not be too distant in time, so that the comparison results are not too strongly influenced by changes in technology and consumer habits. The price index is calculated using the formula:

where I i- price index in i-m year, Ri- price in i-m year, R bases- price in the base year.

Price index is the percentage change in the price of a product (or a group of products) in a given period of time in relation to the price of a product at a base point in time. As an example, consider the US unleaded gasoline price index between 1980 and 2002 (Figure 24). For example:

Rice. 24. Price of a gallon of unleaded gasoline and simple price index in the USA from 1980 to 2002 (base years - 1980 and 1995)

So, in 2002 the price of unleaded gasoline in the United States was 4.8% higher than in 1980. Analysis fig. 24 shows that the price index in 1981 and 1982. was higher than the price index in 1980, and then until 2000 did not exceed the base level. Since 1980 is selected as the base period, it probably makes sense to choose a closer year, for example, 1995. The formula for recalculating the index with respect to the new base time period is:

where Inew- new price index, Iold- old price index, Inew base - the value of the price index in the new base year when calculating for the old base year.

Let's assume that 1995 is selected as the new base. Using formula (10), we obtain a new price index for 2002:

So, in 2002, unleaded gasoline in the United States cost 13.9% more than in 1995.

Unweighted composite price indices. Despite the fact that the price index for any individual product is of undoubted interest, more important is the price index for a group of goods, which allows one to assess the cost and standard of living of a large number of consumers. The unweighted composite price index defined by formula (11) assigns equal weight to each individual commodity. A composite price index reflects the percentage change in the price of a group of goods (often called a basket of goods) in a given period of time in relation to the price of that group of goods at a base point in time.

where t i- item number (1, 2, ..., n), n- the number of goods in the group under consideration, - the sum of prices for each of n goods in time t, is the sum of prices for each of n goods in the zero period of time, is the value of the unweighted composite index in the period of time t.

In fig. 25 shows the average prices for three types of fruit for the period from 1980 to 1999. To calculate the unweighted composite price index in different years, formula (11) is used, considering 1980 as the base year.

So, in 1999 the total price of a pound of apples, a pound of bananas and a pound of oranges was 59.4% higher than the total price of these fruits in 1980.

Rice. 25. Prices (in dollars) for three types of fruits and an unweighted composite price index

The unweighted composite price index expresses the changes in prices for an entire group of goods over time. While this index is easy to calculate, it has two distinct disadvantages. First, when calculating this index, all types of goods are considered equally important, therefore expensive goods acquire an unnecessary influence on the index. Second, not all commodities are consumed equally, so price changes for low-consumption commodities are too strong for the unweighted index.

Weighted composite price indices. Due to the shortcomings of unweighted price indices, weighted price indices are preferable, taking into account the differences in prices and levels of consumption of the goods that form the consumer basket. There are two types of weighted composite price indices. Lapeyre Price Index defined by formula (12) uses consumption levels in the base year. The weighted composite price index takes into account the consumption levels of the goods that make up the consumer basket by assigning a certain weight to each product.

where t- time period (0, 1, 2, ...), i- item number (1, 2, ..., n), n i in the zero period of time, is the value of the LaPeyre index in the period of time t.

Calculations of the Lapeyre index are shown in Fig. 26; 1980 is used as the base year.

Rice. 26. Prices (in dollars), quantity (consumption in pounds per capita) of three types of fruit and the LaPeyre index

So, the Lapeyre index in 1999 is 154.2. This indicates that in 1999 these three types of fruit were 54.2% more expensive than in 1980. Note that this index is less than the unweighted index of 159.4, as prices for oranges - the fruit that are consumed the least - have risen more than the price of apples and bananas. In other words, since the prices of the most consumed fruits rose less than the prices of oranges, the Lapeyre index is less than the unweighted composite index.

Paasche Price Index uses the levels of consumption of the product in the current, not the base period of time. Consequently, the Paasche index more accurately reflects the total cost of consumption of goods in this moment time. However, this index has two significant drawbacks. First, it is usually difficult to determine current consumption levels. For this reason, many popular indexes use the LaPeyre index rather than the Paasche index. Second, if the price of a particular product in the consumer basket rises sharply, buyers reduce their consumption out of necessity, and not because of a change in tastes. The Paasche index is calculated by the formula:

where t- time period (0, 1, 2, ...), i- item number (1, 2, ..., n), n- the number of products in the group under consideration, - the number of units of the product i in the zero time period, is the value of the Paasche index in the time period t.

Calculations of the Paasche index are shown in Fig. 27; 1980 is used as the base year.

Rice. 27. Prices (in dollars), quantity (consumption in pounds per capita) of three types of fruit and Paasche index

So, the Paasche index in 1999 is 147.0. This indicates that in 1999 these three types of fruit were 47.0% more expensive than in 1980.

Some popular price indices. Several price indices are used in business and economics. The most popular is the Consumer Price Index (CPI). Officially, this index is called CPI-U to emphasize that it is calculated for cities (urban), although it is usually called simply CPI. This index is published monthly by the U. S. Bureau of Labor Statistics as the primary tool for measuring the cost of living in the United States. The consumer price index is composite and weighted by the Lapeyre method. It is calculated using the prices of 400 of the most commonly consumed products, types of clothing, transportation, medical and utilities. At the moment, when calculating this index, the period 1982–1984 is used as the base one. (fig. 28). An important function of the CPI is its use as a deflator. The CPI is used to convert actual prices to real prices by multiplying each price by a factor of 100 / CPI. Calculations show that over the past 30 years, the average annual inflation rate in the United States was 2.9%.

Rice. 28. Dynamics of Consumer Index Price; see Excel file for full details

Another important price index published by the Bureau of Labor Statistics is the Producer Price Index (PPI). The PPI is a weighted composite index that uses the Lapeyre method to estimate the change in prices of goods sold by their manufacturers. The PPI is the leading indicator for the CPI. In other words, an increase in the PPI leads to an increase in the CPI, and vice versa, a decrease in the PPI leads to a decrease in the CPI. Financial indices such as the Dow Jones Industrial Average (DJIA), the S&P 500, and the NASDAQ are used to measure changes in the value of US stocks. Many indices measure the profitability of international stock markets. These indices include the Nikkei in Japan, the Dax 30 in Germany and the SSE Composite in China.

Pitfalls associated with timing analysis NS x rows

The value of a methodology that uses information about the past and the present in order to predict the future, more than two hundred years ago, was eloquently described by statesman Patrick Henry: “I have only one lamp that illuminates the path - my experience. Only knowledge of the past allows us to judge the future. "

Time series analysis is based on the assumption that the factors that influenced business activity in the past and affect in the present will continue to operate in the future. If true, time series analysis is an effective predictive and management tool. However, critics classical methods based on time series analysis argue that these methods are too naive and primitive. In other words, a mathematical model that takes into account the factors that operated in the past should not mechanically extrapolate trends into the future without taking into account expert assessments, business experience, changes in technology, as well as the habits and needs of people. In an effort to remedy this situation, in recent years, econometricians have developed sophisticated computer models of economic activity that take into account the factors listed above.

However, time series analysis methods are an excellent forecasting tool (both short-term and long-term) when applied correctly, in combination with other forecasting methods, and with expert judgment and experience in mind.

Summary. In this note, using time series analysis, models have been developed to predict the income of three companies: Wm. Wrigley Jr. Company, Cabot Corporation, and Wal-Mart. The components of the time series are described, as well as several approaches to forecasting the annual time series - the moving average method, the exponential smoothing method, linear, quadratic and exponential models, as well as an autoregressive model. A regression model containing dummy variables corresponding to the seasonal component is considered. The application of the least squares method for forecasting monthly and quarterly time series is shown (Fig. 29).

P degrees of freedom are lost when comparing time series values.

When the trend type is set, it is necessary to calculate the optimal values ​​of the trend parameters based on the actual levels. For this, the method of least squares (OLS) is usually used. Its meaning has already been discussed in the previous chapters of the tutorial; in this case, optimization consists in minimizing the sum of the squares of deviations of the actual levels of the series from the aligned levels (from the trend). For each type of trend, OLS gives a system of normal equations, by solving which the trend parameters are calculated. Consider only three such systems: for a straight line, for a second-order parabola, and for an exponential. Methods for determining the parameters of other types of trend are considered in special monographic literature.

For linear trend normal least squares equations have the form:

Normal least squares equations for exhibitors look like this:

According to the table. 9.1 let us calculate all three listed trends for the dynamic series of potato yield in order to compare them (see Table 9.5).

Table 9.5

Calculation of trend parameters

According to formula (9.29), the parameters of the linear trend are a = 1894/11 = 172.2 centners / ha; b = 486/110 = 4.418 c / ha. The linear trend equation is:

at̂ = 172,2 + 4,418t, where t = 0 in 1987 This means that the average actual and leveled level referred to the middle of the period, i.e. by 1991, equal to 172 c / ha per year

Parabolic trend parameters according to (9.23) are equal to b = 4,418; a = 177,75; c =-0.5571. The parabolic trend equation is у̃ = 177,75 + 4,418t - 0.5571t 2 ; t= 0 in 1991 This means that the absolute increase in yield slows down on average by 2 · 0.56 c / ha per year per year. The absolute growth itself is no longer a constant of the parabolic trend, but is the average value over the period. In the year taken as the starting point, i.e. 1991, the trend passes through a point with an ordinate of 77.75 c / ha; The free term of the parabolic trend is not the average level over the period. The exponential trend parameters are calculated by the formulas (9.32) and (9.33) ln a= 56.5658 / 11 = 5.1423; potentiating, we obtain a= 171.1; ln k= 2.853: 110 = 0.025936; potentiating, we obtain k = 1,02628.

The exponential trend equation is: y̅ = 171.1 1.02628 t.

This means that the average annual rate of fasting yield for the period was 102.63%. At the point taken to the origin, the trend passes the point with the ordinate of 171.1 c / ha.

The levels calculated by the trend equations are recorded in the last three columns of the table. 9.5. As can be seen from this data. the calculated values ​​of the levels for all three types of trends do not differ much, since both the acceleration of the parabola and the growth rate of the exponent are small. The parabola has a significant difference - the growth of levels has stopped since 1995, while with a linear trend, the levels continue to grow, and with an exponential trend, they remain accelerated. Therefore, for forecasts for the future, these three trends are unequal: when extrapolating the parabola for future years, the levels will sharply diverge from the straight line and exponential, which can be seen from Table. 9.6. In this table the printout of the solution on a PC using the Statgraphics program of the same three trends is presented. The difference between their free terms from the ones given above is explained by the fact that the program numbers the years not from the middle, but from the beginning, so the free terms of the trends refer to 1986, for which t = 0. The exponential equation on the printout is left in logarithmic form. The forecast is made for 5 years ahead, i.e. until 2001. When the origin of coordinates (time reference) changes in the parabola equation, the average absolute increase, the parameter b. since as a result of negative acceleration, the growth is constantly decreasing, and its maximum is at the beginning of the period. The parabola constant is only acceleration.

The "Data" line contains the levels of the original series; “Forecast summary” means summary data for forecasting purposes. In the next lines - equations of a straight line, parabola, exponent - in logarithmic form. The ME bar represents the average discrepancy between the levels of the original series and the levels of the trend (flattened). For a straight line and a parabola, this discrepancy is always zero. The exponential levels are on average 0.48852 lower than the levels of the original series. An exact match is possible if the true trend is exponential; in this case there is no coincidence, but the difference is small. The MAE column is the variance s 2 - a measure of the volatility of actual levels relative to the trend, as described in clause 9.7. Graph MAE - average linear deviation of levels from the trend in absolute value (see paragraph 5.8); graph MAPE - relative linear deviation in percent. Here they are given as indicators of the suitability of the selected trend type. The parabola has a smaller variance and a deviation modulus: it is for the period 1986 - 1996. closer to actual levels. But the choice of the type of trend cannot be reduced only to this criterion. In fact, the slowdown in growth is the result of a large negative deviation, i.e., a crop failure in 1996.

The second half of the table is a forecast of yield levels for three types of trends for years; t = 12, 13, 14, 15 and 16 from the origin (1986). The projected levels are exponentially higher up to year 16 than in a straight line. Parabola trend levels are decreasing, diverging more and more from other trends.

As you can see in the table. 9.4, when calculating the trend parameters, the levels of the original series enter with different weights - values t p and their squares. Therefore, the influence of level fluctuations on the trend parameters depends on which number of the year falls on a good or bad year. If a sharp deviation falls on a year with a zero number ( t i = 0 ), then it will not have any effect on the trend parameters, and if it hits the beginning and end of the series, it will have a strong impact. Consequently, a single analytical alignment does not completely free the trend parameters from the influence of fluctuations, and with strong fluctuations they can be greatly distorted, which in our example happened to the parabola. To further eliminate the distorting influence of fluctuations on the trend parameters, you should apply multiple sliding alignment method.

This technique consists in the fact that the trend parameters are not calculated immediately over the entire series, but by a sliding method, first for the first T periods of time or moments, then for the period from the 2nd to t + 1, from 3rd to (t + 2) level, etc. If the number of source levels of the series is NS, and the length of each sliding base for calculating the parameters is T, then the number of such sliding bases t or individual parameter values ​​that will be determined from them will be:

L = n + 1 - T.

The application of the sliding multiple alignment technique can be considered, as can be seen from the above calculations, only with a sufficiently large number of levels in the series, as a rule, 15 or more. Consider this technique using the example of the data in Table. 9.4 - the dynamics of prices for non-fuel goods in developing countries, which again allows the reader to participate in a small scientific study. Using the same example, we will continue the forecasting methodology in Section 9.10.

If we calculate the parameters in our series for 11-year periods (for 11 levels), then t= 17 + 1 - 11 = 7. The meaning of multiple sliding alignment is that with successive shifts of the base for calculating the parameters at the ends and in the middle there will be different levels with different sign and magnitude deviations from the trend. Therefore, with some shifts of the base, the parameters will be overestimated, with others, they will be underestimated, and with the subsequent averaging of the parameter values ​​over all shifts of the calculation base, further mutual compensation of the distortions of the trend parameters by level fluctuations will occur.

Multiple sliding alignment not only allows one to obtain a more accurate and reliable estimate of the trend parameters, but also to control the correctness of the choice of the type of trend equation. If it turns out that the leading trend parameter, its constant when calculating on moving bases does not fluctuate randomly, but systematically changes its value in a significant way, then the trend type was chosen incorrectly, this parameter is not a constant.

As for the free term with multiple equalization, there is no need and, moreover, it is simply incorrect to calculate its value as the average over all base shifts, because with this method, individual levels of the original series would be included in the calculation of the average with different weights, and the sum of the aligned levels diverged would be with the sum of the members of the original series. The free term of the trend is the average value of the level for the period, provided that the time is counted from the middle of the period. When counting from the beginning, if the first level t i= 1, the free term will be: a 0 = at̅ - b((N-1) / 2). It is recommended to choose the length of the moving base for calculating the trend parameters at least 9-11 levels in order to sufficiently dampen the level fluctuations. If the original row is very long, the base can be up to 0.7 - 0.8 of its length. To eliminate the influence of long-period (cyclical) fluctuations on the trend parameters, the number of base shifts should be equal or multiple of the length of the cycle of fluctuations. Then the beginning and the end of the base will sequentially "run through" all phases of the cycle, and when the parameter is averaged over all shifts, its distortions from cyclic oscillations will be canceled out. Another way is to take the length of the sliding base equal to the cycle length, so that the beginning of the base and the end of the base always fall on the same phase of the oscillation cycle.

Since according to the table. 9.4, it has already been established that the trend has a linear form, we calculate the average annual absolute growth, i.e., the parameter b linear trend equations in a sliding way over 11-year bases (see Table 9.7). It also provides the calculation of the data necessary for the subsequent study of the oscillation in paragraph 9.7. Let us dwell in more detail on the technique of multiple alignment on sliding bases. Let's calculate the parameter b on all bases:

Table 9.7

Multiple sliding straight alignment



Trend equation: at̂ = 104,53 - 1,433t; t = 0 in 1987. So, the price index decreased by 1.433 points on average over the year. One-time alignment across all 17 levels can distort this parameter, because the initial level contains a significant negative deviation, and the final level is positive. Indeed, a one-time equalization gives the value of the average annual change in the index by only 0.953 points.




9.7. Study methodology and indicators fluctuations

If in the study and measurement of the tendency of the dynamics of the level fluctuations played only the role of noise, "information noise", from which it was necessary to abstract as much as possible, then in the future the fluctuation itself becomes the subject of statistical research. The importance of studying fluctuations in the levels of the dynamic series is obvious: fluctuations in yield, livestock productivity, meat production are economically undesirable, since the need for agricultural products is constant. These fluctuations should be reduced by applying progressive technology and other measures. On the contrary, seasonal fluctuations in the production volumes of winter and summer shoes, clothing, ice cream, umbrellas, skates are necessary and natural, since the demand for these goods also fluctuates seasonally and uniform production requires extra costs storage of stocks. Regulation of the market economy, both by the state and by producers, to a large extent consists in the regulation of fluctuations in economic processes.

The types of fluctuations of statistical indicators are very diverse, but still three main ones can be distinguished: sawtooth or pendulum fluctuations, cyclical long-period and randomly distributed fluctuations in time. Their properties and differences from each other are clearly visible in the graphic image in Fig. 9.2.

Sawtooth or pendulum oscillation consists in alternating deviations of levels from the trend in one direction or the other. Such are the self-oscillations of a pendulum. Such self-oscillations can be observed in the dynamics of yield at a low level of agricultural technology: a high yield under favorable weather conditions takes out more nutrients from the soil than they are formed naturally in a year; the soil is depleted, which causes the next yield to fall below the trend, it takes out less nutrients than is formed in a year, fertility increases, etc.

Rice. 9.2 . Types of vibrations

Cyclic long-period fluctuations It is characteristic, for example, of solar activity (10-11-year cycles), and therefore, processes associated with it on Earth - auroras, thunderstorm activity, the productivity of individual crops in a number of regions, some diseases of people and plants. This type is characterized by a rare change in the signs of deviations from the trend and the cumulative (accumulating) effect of deviations of one sign, which can have a heavy impact on the economy. But fluctuations are well predicted.

Oscillation randomly distributed in time is irregular, chaotic. It can arise when a set (interference) of a set of oscillations with cycles of different durations is superimposed. But it can arise as a result of the equally chaotic fluctuations of the main reason for the existence of fluctuations, for example, the amount of precipitation over the summer period, the average air temperature per month in different years.

To determine the type of fluctuations, a graphic image, the method of "turning points" by M. Kendal, and the calculation of the autocorrelation coefficients of deviations from the trend are used. These methods will be discussed later.

The main indicators characterizing the strength of the level fluctuations are indicators already known from Chapter 5 that characterize the variation in the values ​​of a feature in a spatial population. However, variation in space and oscillation in time are fundamentally different. First of all, their main reasons are different. The variation in the values ​​of the attribute for simultaneously existing units arises from differences in the conditions of existence of the units of the population. For example, different yields of potatoes in state farms of the region in 1990 are caused by differences in soil fertility, in the quality of seeds, in agricultural technology. But the amounts effective temperatures during the growing season and precipitation are not the reasons for the spatial variation, since in the same year in the territory of the region these factors hardly vary. On the contrary, the main reasons for fluctuations in potato productivity in the region over a number of years are just fluctuations in meteorological factors, and the quality of soil has almost no fluctuations. As for the general progress of agricultural technology, it is the cause of the trend, but not the fluctuation.

The second fundamental difference is that the values ​​of a varying feature in a spatial set can be considered largely independent of each other, on the contrary, the levels of a dynamic series are, as a rule, dependent: these are indicators of a developing process, each stage of which is associated with previous states.

Third, the variation in the spatial population is measured by the deviations of the individual values ​​of the trait from the mean, and the variability of the levels of the time series is measured not by their differences from the average level (these differences include both trend and fluctuations), but by the deviations of the levels from the trend.

Therefore, it is better to use different terms: the differences in a feature in the spatial aggregate should be called only variation, but not fluctuations: no one will call the differences in the population size of Moscow, St. Petersburg, Kiev and Tashkent “fluctuations in the number of inhabitants”! The deviations of the levels of the time series from the trend will always be called oscillation. Oscillations always occur in time; oscillations cannot exist outside of time, at a fixed moment.

On the basis of the qualitative content of the concept of oscillation, a system of its indicators is also built. Indicators of the strength of level fluctuations are: amplitude of deviations of the levels of individual periods or moments from the trend (modulo), the average absolute deviation of levels from the trend (modulo), standard deviation; -determination of levels from the trend. Relative measures of variability: the relative linear deviation from the trend and the coefficient of variability is an analogue of the coefficient of variation.

A feature of the methodology for calculating average deviations from the trend is the need to take into account the loss of the degrees of freedom of oscillations by an amount equal to the number of parameters of the trend equation. For example, a straight line has two parameters, and as you know from geometry, you can draw a straight line through any two points. So, having only two levels, we will draw a trend line exactly through these two levels, and there will be no deviations of the levels from the trend, although in fact these two levels included fluctuations, were not free from the influence of fluctuation factors. A parabola of the second order will pass exactly through any three points, etc.

Taking into account the loss of degrees of freedom, the main absolute indicators of oscillation are calculated by the formulas (9.34) and (9.35):

mean linear deviation

(9.34)

standard deviation

(9.35)

where y i- the actual level;

ŷ i - leveled level, trend;

n- the number of levels;

R - number of trend parameters.

Sign of the time " t”In parentheses after the indicator means that it is not a measure of the usual spatial variation, as in chapter V, but a measure of variability over time.

The relative indicators of oscillation are calculated by dividing absolute indicators to the average level for the entire study period. The calculation of the indicators of volatility will be carried out according to the results of the analysis of the dynamics of the price index (see Table 9.7). We will accept the trend based on the results of multiple sliding alignment, i.e. at̂ = 104,53 - 1,433t ; t= 0 in 1987

1. The amplitude of fluctuations ranged from -14.0 in 1986 to +15.2 in 1984, i.e. 29.2 pips

2. The mean linear deviation in absolute value is found by adding the modules | u i | (their sum is 132.3), and dividing by (NS), according to the formula (9.34):

= 8.82 points.

3. The standard deviation of the levels from the trend according to the formula (9.35) was:

= 9.45 points.

Slight excess of average standard deviation over linear indicates the absence of deviations sharply distinguished in absolute value among the deviations.

4. Coefficient of oscillation: or 9.04%. The fluctuation is moderate, not strong. For comparison, we present indicators (without calculation) for fluctuations in potato yield, the data in tables 9.1 and 9.5 - deviation from the linear trend:

s(t) = 14.38 centners per hectare, v(t) = 8,35%.

To identify the type of vibrations, we will use the technique proposed by M. Kendal. It consists in counting the so-called "turning points" in a series of deviations from the trend andi i.e., local extrema. The deviation, either greater in algebraic value, or less than two adjacent ones, is marked with a dot. Let's turn to fig. 9.2. With pendulum oscillation, all deviations, except for the two extreme ones, will be "rotary", therefore, their number will be NS - 1. With long-period cycles, there is one minimum and one maximum per cycle, and the total number of points will be 2 ( n: l), where l- the duration of the cycle. With a randomly distributed oscillation in time, as M. Kendal proved, the number of turning points on average will be: 2/3 ( n- 2). In our example, with pendulum oscillation, there would be 15 points, with associated with an 11-year cycle, there would be 2- (17: 11) ≈ 3 points, with randomly distributed in time, on average, it would be (2/3) ) = 10 points.

The actual number of points 6 goes beyond the two-fold standard deviation of the number of turning points, which according to Kendall is equal, in our case .

The presence of 6 points, at 2 points per cycle, means that there can be approximately 3 cycles in a row, the duration of which is 5.5 - 6 years. A combination of such cyclical fluctuations with random ones is possible.

Another method of analyzing the type of oscillation and finding the cycle length is based on calculating the coefficients autocorrelation of deviations from the trend.

Autocorrelation is the correlation between the levels of a series or deviations from the trend taken with a shift in time: by 1 period (year), by 2, by 3, etc., therefore, we talk about the autocorrelation coefficients of different orders: first, second, etc. Let us first consider the autocorrelation coefficient of deviations from the first-order trend.

One of the main formulas for calculating the autocorrelation coefficient of deviations from the trend is as follows:

(9.36)

As it is easy to see from the table. 9.7, the first and last deviations in the series are involved in only one product in the numerator, and all other deviations from the second to (NS - 1) th - in two. Therefore, in the denominator, the squares of the first and last deviations should be taken with half the weight, as in the chronological average. According to the table. 9.7 we have:

Now let's turn to fig. 9.2. With pendulum oscillation, all products in the numerator will be negative, and the first-order autocorrelation coefficient will be close to -1. With long-term cycles, positive products of neighboring deviations will prevail, and the sign change occurs only twice per cycle. The longer the Cycle, the greater the preponderance of positive products in the numerator, and the first-order autocorrelation coefficient is closer to +1. In case of randomly distributed oscillation in time, the signs of deviations alternate chaotically, the number of positive products is close to the number of negative ones, which is why the autocorrelation coefficient is close to zero. The resulting value indicates the presence of both randomly distributed in time oscillations, and cyclical. Autocorrelation coefficients of the following orders: II = - 0.577; W = -0.611; IV == -0.095; V = +0.376; VI = +0.404; VII = +0.044. Consequently, the antiphase of the cycle is closest to 3 years (the largest negative coefficient with a shift of 3 years), and the coinciding phases are closer to 6 years, which gives the length of the oscillation cycle. These coefficients, maximum in absolute value, are not close to unity. This means that cyclical fluctuations are mixed with significant random fluctuations. Thus, detailed autocorrelation analysis generally yielded the same results as the findings for first-order autocorrelation.

If the dynamic range is long enough, it is possible to formulate and solve the problem of changing the fluctuations over time. To do this, these indicators are calculated by sub-periods, but with a duration of at least 9-11 years, otherwise the measurements of fluctuations are unreliable. In addition, you can calculate the volatility indicators in a sliding way, and then perform their alignment, that is, calculate the trend of the volatility indicators. This is useful for drawing a conclusion about the effectiveness of measures applied to reduce yield fluctuations and other unwanted fluctuations, as well as for predicting the size of fluctuations expected in the future along the trend.

9.8. Measuring stability in dynamics

Resilience is used in very different ways. In relation to the statistical study of dynamics, we will consider two aspects of this concept: 1) stability as a category opposite to oscillation; 2) the stability of the direction of changes, that is, the stability of the trend.

In the first sense, the indicator of stability, which can only be relative, should vary from zero to one (100%). This is the difference between one and the relative indicator of fluctuations. The oscillation coefficient was 9.0%. Therefore, the stability factor is 100% - 9.0% = 91.0%. This indicator characterizes the proximity of the actual levels to the trend and does not depend at all on the nature of the latter. Weak volatility and high stability of levels in this sense can exist even with complete stagnation in development, when the trend is expressed by a horizontal straight line.

Stability in the second sense characterizes not the levels themselves, but the process of their directed change. You can find out, for example, how stable is the process of reducing the unit costs of resources for the production of a unit of output, whether the trend of decreasing infant mortality is stable, etc. the level is either above all the previous ones (steady growth), or below all the previous ones (steady decline). Any violation of a strictly ranked sequence of levels indicates an incomplete stability of changes.

The method of constructing its indicator follows from the definition of the concept of tendency stability. As an indicator of sustainability, you can use rank correlation coefficient C. Spearman - r x.

where NS - number of levels;

Δ i is the difference between the ranks of the levels and the numbers of the time periods.

With complete coincidence of the ranks of the levels, starting from the smallest, and the numbers of periods (moments) of time according to their chronological order the rank correlation coefficient is +1. This value corresponds to the case of complete stability of the level increase. With the complete opposite of the ranks of the levels to the ranks of the years, the Spearman coefficient is -1, which means that the process of reducing the levels is completely stable. With a chaotic alternation of the ranks of the levels, the coefficient is close to zero, which means the instability of any tendency. Let's give the calculation of the Spearman correlation coefficient according to the data on the dynamics of the price index (Table 9.7) in table. 9.8.

Table 9.8

Calculation of the correlation coefficients of Spearman's ranks

Rank years, Rx

Rank of levels, RU

Rx-Ry

(P x -P y) 2

Due to the presence of three pairs of "related ranks", we apply the formula (8.26):

Negative meaning r x indicates the presence of a downward trend in levels, and the stability of this trend is below average.

It should be borne in mind that even with 100% stability of the trend in the series of dynamics, there may be fluctuations in levels, and the coefficient their stability will be below 100%. With weak fluctuations, but an even weaker trend, on the contrary, a high level stability coefficient is possible, but a trend stability coefficient close to zero. In general, both indicators are connected, of course, by a direct relationship: most often, a greater stability of levels is observed simultaneously with a greater stability of the trend.

The stability of the development trend, or complex stability, in dynamics can be characterized by the ratio between the average annual absolute change and the root-mean-square (or linear) deviation of levels from the trend:

If, as often happens, the distribution of deviations of the series levels from the trend is close to normal, then with a probability of 0.95 the deviation from the downward trend will not exceed 1.645 s(t) in size. Therefore, if in the series of dynamics

with> 1.64, then levels lower than the previous ones will, on average, occur less than 5 times in 100 periods, or 1 time in 20, i.e., the stability of the trend will be high. At with= 1 level ranking violations will occur on average 16 times out of 100, and when with= 0.5 - already 31 times out of 100, i.e., the stability of the trend will be low. You can also use the ratio of the average growth rate to the oscillation coefficient, which gives an indicator close to with - indicator of sustainability. This indicator is more suitable for an exponential trend. On indicators of stability of nonlinear trends and common problems the stability of economic and social processes can be read in more detail in the literature recommended for this chapter.

Straight line - trend values ​​of profitability (linear trend based on the data of actual values ​​of profitability).


Example 14.6. Let us construct a linear trend in interest rates on loans based on statistical data published in the Bulletin of Banking Statistics No. 4 (47) for 1997.

The second step is to find the values ​​of the equation parameters. The parameters of trend models are determined using a system of normal equations. In the case of a linear trend, the following system of equations is used, which is solved by the least squares method

Example 14.7. Assuming the presence of cyclical fluctuations, we will conduct a harmonic analysis of the dynamics of deviations from the linear trend of data on interest rates on loans (y, - y,).

A linear trend reflects well the tendency of changes under the action of many different factors, changing in different ways according to different patterns. The resultant of these factors in the mutual compensation of the features of individual factors

For b = 1, we have a linear trend, b = 2 - parabolic, etc. The power-law form is flexible, suitable for displaying changes with different measures of proportionality of changes over time. A strict condition is the obligatory passage through the origin of coordinates at t = 0, y = 0. You can complicate the form of the trend y = a + th or y = a + th, but these equations cannot be logarithmized, it is difficult to calculate the parameters, and they are rarely used.

For a linear trend, the normal OLS equations have the form

In the formula (9.33) the summation from = - (n-1) 2 to / = (n-1) 2 in general, the formula (9.33) is similar to the formula for the linear trend (9.29).

According to formula (9.29), the parameters of the linear trend are equal to a = 1894/11 = 172.2 c / ha 2> A = 486/110 = 4.418 c / ha. The linear trend equation has the form y = 172.2 + 4.418 /, where (= 0 in 1987. This means that the average actual and leveled level, referred to the middle of the period, i.e., to 1991, is equal to 172 q s 1 ha, and the average annual growth is 4.418 c / ha per year.

Since according to the table. 9.4, it has already been established that the trend has a linear form, we calculate the average annual absolute increase, i.e., the parameter b of the linear trend equation

The fluctuation is moderate, not strong. For comparison, we present indicators (without calculation) for fluctuations in potato yield, the data of tables 9.1 and 9.5 - deviation from the linear trend s (t) = 14.38 centners per hectare, v (t) = 8.35%.

To obtain sufficiently reliable boundaries for predicting the position of the trend, say, with a probability of 0.9 that the error will be no more than the specified one, the average error should be multiplied by the value of the Student's t-criterion at the specified probability (or significance 1 - 0.9 = 0.1 ) and with the number of degrees of freedom equal, for a linear trend, N-2, ie 15. This value is 1.753. We get the maximum error with a given probability

Another technique for measuring the correlation in the series of dynamics can be the correlation between those of the chain indicators of the series, which are the constants of their trends. With linear trends, these are absolute chain increments. Calculating them according to the original series of dynamics (axl, ayi), we find the correlation coefficient between the absolute changes according to formula (9.52) or, more precisely, according to formula (9.51), since the average changes are not equal to zero, in contrast to the average deviations from the trends. The admissibility of this method is based on the fact that the difference between adjacent levels mainly consists of fluctuations, and the share of the trend in them is small, therefore, the distortion of the correlation from the trend is very large with a cumulative effect over a long period, very small - for each year separately. However, it should be remembered that this is true only for series with c-exponent significantly less than one. In our example, for the yield series, the c-indicator is 0.144, for the cost price it is 0.350. The correlation coefficient of chain absolute changes was 0.928, which is very close to the correlation coefficient of deviations from trends.

In an earlier example, we looked at a two-month production forecast for a Dublin company. Estimates were obtained for 1997 using a linear trend and addition method. Forecasted values ​​are given in tonnes

K-values ​​for estimating the confidence intervals of the forecast relative to the linear trend with a probability of 0.8

Adaptive linear trend modeling using exponential moving averages.

Algorithm for calculating the parameters of a linear trend

Calculate as a first approximation the parameters of a linear trend

Determine the final values ​​of the linear trend parameters

EMA errors can degrade the quality of the forecast. In this case, when calculating the parameters of the linear trend, you need to stop at step 2 of this algorithm.

LN - linear trend, seasonality is not taken into account

If we assume that price changes, contrary to efficiency considerations over long periods of time, are determined by numerous and often nonlinear feedbacks, then on the basis of chaos theory it is possible to build improved models describing the influence of the past on the present (see -). Dramatic market crashes in the absence of significant changes in information, abrupt changes in access conditions and terms when a company crosses some invisible threshold in the credit sector - all these are manifestations of nonlinearity. The actual behavior of financial markets, rather, contradicts the rules of reversal of linear trends, rather than confirms them.

The method of successive differences is as follows: if the series contains a linear trend, then the original data is replaced by the first differences

Lu values ​​do not have a clearly defined trend, they vary around the average level, which means the presence of a linear trend (linear trend) dynamics in a number. A similar conclusion can be made for the series x, the absolute increments do not have a systematic orientation, they are approximately stable, and, therefore, the series is characterized by a linear trend.

This led to the idea of ​​measuring the correlation not of the levels x and yy themselves, but of the first differences Dx, = x, -, 6y, - y, - y, .., (with linear trends). In the general case, it was found necessary to correlate deviations from trends (minus the cyclical component) Ey -y, -%, Ex = x, -%, (y,% are trends in time series).

The graph in Fig. 5.3 clearly shows the presence of an upward trend. There may be a linear trend.

The parameters of the linear trend can be interpreted as a - the initial level of the time series at the time t = 0 b - the average absolute increase in the levels of the series over the period. With regard to this time series, it can be said that the growth rates of the nominal monthly wages for 10 months of 1999 varied from the level of 82.66% with an average absolute growth of 4.72% for the month. item. The values ​​of the time series levels calculated according to a linear trend are determined in two ways. First, it is possible to successively substitute the values ​​/ = 1, 2, ..., l into the found trend equation, i.e.

Secondly, in accordance with the interpretation of the parameters of the linear trend, each subsequent level of the series is the sum of the previous level and the average chain absolute increase, i.e.

Thus, the starting level of the series according to the exponential trend equation is 83.96 (compare with the starting level of 82.66 in the linear trend), and the average chain growth rate is 1.046. Hence, we can say that