Autocorrelation function and additive time series model. See pages where the term autocorrelation coefficient is mentioned

Introduction

1. The essence and reasons for autocorrelation

2. Autocorrelation detection

3. Consequences of autocorrelation

4. Methods of elimination

4.1 Definition

based on Darbin-Watson statistics

Conclusion

List of used literature

Introduction

Models built from data characterizing one object for a number of sequential moments (periods) are called time series models. A time series is a collection of values ​​of an indicator for several consecutive moments or periods. Application traditional methods correlation-regression analysis for studying the causal relationships of variables presented in the form of time series can lead to a number of serious problems arising both at the stage of construction and at the stage of analysis of econometric models. First of all, these problems are associated with the specifics of time series as a data source in econometric modeling.

It is assumed that, in general, each level of a time series contains three main components: a trend (T), cyclical or seasonal fluctuations (S), and a random component (E). If the time series contain seasonal or cyclical fluctuations, then before carrying out further study of the relationship, it is necessary to eliminate the seasonal or cyclical component from the levels of each series, since its presence will lead to an overestimation of the true indicators of the strength and connection of the studied time series if both series contain cyclical fluctuations of the same frequency, or to underestimate these indicators in the event that seasonal or cyclical fluctuations contain only one of the series or the frequency of fluctuations in the considered time series is different. The elimination of the seasonal component from the levels of time series can be carried out in accordance with the methodology for constructing additive and multiplicative models. If the time series under consideration has a trend, the correlation coefficient in absolute value will be high, which in this case is the result of x and y being time-dependent or trendy. In order to obtain the correlation coefficients characterizing the causal relationship between the studied series, one should get rid of the so-called false correlation caused by the presence of a trend in each series. The influence of the time factor will be expressed in the correlation dependence between the values ​​of the residuals

for the current and previous points in time, which is called "autocorrelation in residuals."

1.The essence and reasons for autocorrelation

Autocorrelation is the relationship of successive elements of a time or spatial data series. In econometric studies, situations often arise when the variance of the residuals is constant, but their covariance is observed. This phenomenon is called residual autocorrelation.

Autocorrelation of residuals is most often observed when an econometric model is built on the basis of a time series. If there is a correlation between successive values ​​of some independent variable, then there will be a correlation between successive residual values. Autocorrelation can also result from erroneous specification of the econometric model. In addition, the presence of autocorrelation of residuals may mean that a new explanatory variable needs to be introduced into the model.

Autocorrelation in the residuals is a violation of one of the basic premises of the OLS - the premise of the randomness of the residuals obtained by the regression equation. One of possible ways the solution to this problem is to apply to the estimation of the parameters of the generalized OLS model.

Among the main reasons for the appearance of autocorrelation, one can single out specification errors, inertia in changes in economic indicators, the web effect, and data smoothing.

Specification errors. The failure to take into account any important explanatory variable in the model or the wrong choice of the form of dependence usually leads to systemic deviations of the observation points from the regression line, which can lead to autocorrelation.

Inertia. Many economic indicators (for example, inflation, unemployment, GNP, etc.) have a certain cyclical nature associated with the waveform of business activity. Indeed, an economic recovery leads to an increase in employment, a decrease in inflation, an increase in GNP, etc. This growth continues until changes in market conditions and a number of economic characteristics lead to a slowdown in growth, then a halt and reversal of the indicators under consideration. In any case, this transformation does not occur instantly, but has a certain inertia.

Cobweb effect. In many industrial and other areas, economic performance responds to change economic conditions with a delay (time lag). For example, the supply of agricultural products reacts to price changes with a lag (equal to the ripening period of the crop). The high price of agricultural products in the past year will (most likely) cause its overproduction in this year, and, consequently, the price for it will decrease, etc.

Data smoothing. Often, data for a certain long time period is obtained by averaging data over its constituent subintervals. This can lead to a certain smoothing of fluctuations that were present within the period under consideration, which in turn can cause autocorrelation.

2.Detecting autocorrelation

Due to the unknown values ​​of the parameters of the regression equation, the true values ​​of the deviations will also be unknown

, t = 1.2 ... T. Therefore, conclusions about their independence are made on the basis of estimates, t = 1.2 ... T, obtained from the empirical regression equation. Let's consider possible methods for determining autocorrelation.

2.1 Graphical method

There are several options for defining autocorrelation graphically. One of them indicating deviations

with the moments t of their receipt (their serial numbers i) is shown in Fig. 2.1 These are the so-called sequential-time charts. In this case, either the time (moment) of obtaining statistical data or the ordinal number of the observation is usually plotted along the abscissa axis, and deviations (or estimates of deviations)
Figure 2.1.

It is natural to assume that in Fig 2.1. a-d there are certain connections between deviations, i.e. autocorrelation takes place. The absence of dependence in Fig. d most likely indicates a lack of autocorrelation.

For example, in Fig. 2.1.b deviations are initially mostly negative, then positive, then negative again. This indicates the presence of a certain relationship between the deviations.

2.2. Row method

This method is quite simple: the signs of deviations are determined sequentially.

, t = 1.2 ... T. For example,

(-----)(+++++++)(---)(++++)(-),

Those. 5 "-", 7 "+", 3 "-", 4 "+", 1 "-" at 20 observations.

A row is defined as a continuous sequence of identical characters. The number of characters in a row is called the row length.

The visual distribution of signs indicates the non-random nature of the connections between deviations. If there are too few series compared to the number of observations n, then a positive autocorrelation is quite likely. If there are too many series, then negative autocorrelation is likely.

2.3 Durbin-Watson criterion

The most famous criterion for detecting first-order autocorrelation is the Durbin-Watson test and the calculation of the quantity

(2.3.1)

According to (2.3.1), the quantity d is the ratio of the sum of squares of the differences of successive values ​​of the residuals to the residual sum of squares according to the regression model. The value of the Durbin - Watson criterion is indicated along with the coefficient of determination, values t- and F- criteria.


If there is a trend and cyclical fluctuations in a time series, the values ​​of each subsequent level of the series depend on the previous ones. The correlation dependence between successive levels of the time series is called autocorrelation of the levels of the series.

Quantitatively, it can be measured using a linear correlation coefficient between the levels of the original time series and the levels of this series, shifted by several steps in time.

The formula for calculating the autocorrelation coefficient is:

This value is called autocorrelation coefficient levels of the first-order series, since it measures the relationship between adjacent levels of the series and.

Similarly, you can determine the autocorrelation coefficients of the second and higher orders. So, the second-order autocorrelation coefficient characterizes the tightness of the connection between the levels and and is determined by the formula:

(4.2)

If the first order autocorrelation coefficient turned out to be the highest, the studied series contains only a trend. If the autocorrelation coefficient of the order turned out to be the highest, then the series contains cyclical fluctuations with a periodicity in points in time. If none of the autocorrelation coefficients is significant, one of two assumptions about the structure of this series can be made: either the series does not contain trends and cyclical fluctuations, or the series contains a strong non-linear trend, which requires additional analysis to be identified.

Autocorrelation coefficient properties.

1. It is constructed by analogy with linear coefficient correlation and thus characterizes the tightness of only the linear relationship of the current and previous levels of the series. Therefore, the autocorrelation coefficient can be used to judge the presence of a linear (or close to linear) trend.

2. By the sign of the autocorrelation coefficient it is impossible to draw a conclusion about an increasing or decreasing trend in the levels of the series. Most time series of economic data contain a positive autocorrelation of levels, however, they may have a decreasing trend.

The number of periods for which the autocorrelation coefficient is calculated is called lag.

A sequence of autocorrelation coefficients of the levels of the first, second, etc. orders are called autocorrelation function time series. The graph of the dependence of its values ​​on the lag value (of the order of the autocorrelation coefficient) is called correlogram.

In a significant part of the time series, there is a relationship between levels, especially closely spaced ones, i.e. the values ​​of each subsequent level of the series depend on the previous ones. The correlation dependence between successive levels of the time series is called autocorrelation of the levels of the series. It can be quantitatively measured using the correlation coefficient between the levels of the original time series and the levels of this series, shifted by several steps in time. The number of levels for which the autocorrelation coefficient is calculated is called lag.

y t and y t-1, i.e. 1st order autocorrelation coefficient

, .

Note that the autocorrelation coefficient is calculated using ( n–1), and not by n pairs of observations.

We now define 2nd order autocorrelation coefficient, the correlation coefficient between the series y t and y t-2, i.e.

, (9.15)

, .

Note that the calculation of the second-order autocorrelation coefficient will already be performed using ( n–2) pairs of observations.

It should be borne in mind that with an increase in the lag, the number of pairs of values ​​for which the autocorrelation coefficient is calculated decreases. Therefore, some authors consider it expedient to use the rule to ensure the statistical reliability of the autocorrelation coefficients - the maximum order of the autocorrelation coefficient should not exceed n/4.

We note two important properties of the autocorrelation coefficient:

First, it is constructed by analogy with the usual correlation coefficient and thus characterizes the tightness of only the linear relationship between the current and previous levels of the series. Therefore, the autocorrelation coefficients can be used to judge the presence of a linear (or close to linear) trend. For some time series that have a strong non-linear trend (for example, a parabola or exponential), the autocorrelation coefficients of the levels can approach zero.

Secondly, according to the sign of the autocorrelation coefficient, it is impossible to draw a conclusion about an increasing or decreasing trend in the levels of the series. Most time series of economic data contain a positive autocorrelation of levels, however, they may have a decreasing trend.

For a long time series, a series of autocorrelation coefficients can be determined, sequentially increasing the lag value: r 1 , r 2 , r 3, ... The sequence of autocorrelation coefficients is called autocorrelation function time series. The graph of the dependence of the values ​​of the autocorrelation coefficients on the lag value (of the order of the autocorrelation coefficient) is called correlogram.

Analysis of the autocorrelation function and correlogram allows you to clarify the structure of the time series, to reveal the presence or absence of a trend or periodic fluctuations in it. If the time series is characterized by a clearly expressed linear trend, then for it the autocorrelation coefficient of the 1st order approaches 1. If the time series contains periodic fluctuations, then the autocorrelation function will also contain periodic fluctuations. If the time series does not contain periodic oscillations, then the correlogram is a damped function, i.e. high-order autocorrelation coefficients approach zero.



Correlogram analysis is sometimes quite a daunting task. Therefore, we will briefly dwell on the typical behavior of correlograms for some classes of time series. First, let's consider the behavior of the correlogram for some non-stationary time series. On the graphs, in addition to the values ​​of the function itself, they usually indicate the confidence limits of this function.

For time series containing the trend, the correlogram does not tend to zero with increasing lag t. Its characteristic behavior is depicted in Figure 9.1.

Rice. 9.1. Correlogram of a number of grain yields in Rossiis 1945 to 1989. in c / ha: a) initial time series; b) its correlogram.

For time series with seasonal fluctuations the correlogram will also contain periodic bursts corresponding to the seasonal period. This allows you to set the estimated seasonality period. The typical behavior of a correlogram is shown in Figure 9.2.

Rice. 9.2. Correlogram of a series of monthly champagne sales for 7 consecutive years on a logarithmic scale (after removing linear trend): a) the transformed original time series; b) its correlogram.



Example 9.1. There are quarterly conditional data on the volumes of electricity consumption by residents of the region.

Table 9.7

Construct an autocorrelation function of the time series.

Solution. To calculate the autocorrelation coefficients of the initial time series, we will compose a table (Table 9.8):

Table 9.8

t y t y t -1 y t -2 y t -3 y t -4 y t -5 y t -6
6,0
4,4 6,0
5,0 4,4 6,0
9,0 5,0 4,4 6,0
7,2 9,0 5,0 4,4 6,0
4,8 7,2 9,0 5,0 4,4 6,0
6,0 4,8 7,2 9,0 5,0 4,4 6,0
10,0 6,0 4,8 7,2 9,0 5,0 4,4
8,0 10,0 6,0 4,8 7,2 9,0 5,0
5,6 8,0 10,0 6,0 4,8 7,2 9,0
6,4 5,6 8,0 10,0 6,0 4,8 7,2
11,0 6,4 5,6 8,0 10,0 6,0 4,8
9,0 11,0 6,4 5,6 8,0 10,0 6,0
6,6 9,0 11,0 6,4 5,6 8,0 10,0
7,0 6,6 9,0 11,0 6,4 5,6 8,0
10,8 7,0 6,6 9,0 11,0 6,4 5,6

Determine the correlation coefficient between the rows y t and y t-1, i.e. 1st order autocorrelation coefficient. Note that the autocorrelation coefficient is calculated for 15, not 16 pairs of observations. Let's compose a table for calculating the 1st order autocorrelation coefficient (table 9.9):

Table 9.9

t y t y t -1
6,0
4,4 6,0 -2,987 -1,067 3,186 8,920 1,138
5,0 4,4 -2,387 -2,667 6,364 5,696 7,111
9,0 5,0 1,613 -2,067 -3,334 2,603 4,271
7,2 9,0 -0,187 1,933 -0,361 0,035 3,738
4,8 7,2 -2,587 0,133 -0,345 6,691 0,018
6,0 4,8 -1,387 -2,267 3,143 1,923 5,138
10,0 6,0 2,613 -1,067 -2,788 6,830 1,138
8,0 10,0 0,613 2,933 1,799 0,376 8,604
5,6 8,0 -1,787 0,933 -1,668 3,192 0,871
6,4 5,6 -0,987 -1,467 1,447 0,974 2,151
11,0 6,4 3,613 -0,667 -2,409 13,056 0,444
9,0 11,0 1,613 3,933 6,346 2,603 15,471
6,6 9,0 -0,787 1,933 -1,521 0,619 3,738
7,0 6,6 -0,387 -0,467 0,180 0,150 0,218
10,8 7,0 3,413 -0,067 -0,228 11,651 0,004
The average 110,8 9,813 65,317 54,053

According to the table, we find

, .

Using formula (9.14), we find

.

Let us now determine the second-order autocorrelation coefficient, the correlation coefficient between the series y t and y t-2. Note that the calculation of the second-order autocorrelation coefficient will already be performed for 14 pairs of observations. Let's compose a table for calculating the 2nd order autocorrelation coefficient (table 9.10):

Table 9.10

t y t y t -2
6,0
4,4
5,0 6,0 -2,600 -1,071 2,786 6,760 1,148
9,0 4,4 1,400 -2,671 -3,740 1,960 7,137
7,2 5,0 -0,400 -2,071 0,829 0,160 4,291
4,8 9,0 -2,800 1,929 -5,400 7,840 3,719
6,0 7,2 -1,600 0,129 -0,206 2,560 0,017
10,0 4,8 2,400 -2,271 -5,451 5,760 5,159
8,0 6,0 0,400 -1,071 -0,429 0,160 1,148
5,6 10,0 -2,000 2,929 -5,857 4,000 8,577
6,4 8,0 -1,200 0,929 -1,114 1,440 0,862
11,0 5,6 3,400 -1,471 -5,003 11,560 2,165
9,0 6,4 1,400 -0,671 -0,940 1,960 0,451
6,6 11,0 -1,000 3,929 -3,929 1,000 15,434
7,0 9,0 -0,600 1,929 -1,157 0,360 3,719
10,8 6,6 3,200 -0,471 -1,509 10,240 0,222
The average 106,4 -31,120 55,760 54,049

According to the table, we find

, .

Using formula (9.15), we find

.

Similarly, we calculate the autocorrelation coefficients of the 3rd and higher orders. (Note that in Excel the correlation coefficients are calculated using the CORREL function). As a result, we get the autocorrelation function of the original time series. Its values ​​and correlogram are given in tab. 9.11.

Table 9.11

Analysis of the values ​​of the autocorrelation function allows us to conclude that the studied time series contains At first, linear trend, Secondly, seasonal fluctuations at intervals of four quarters. This conclusion is confirmed by a graphical analysis of the structure of the series (see Fig. 9.1).

After calculations, it is necessary to determine at which lag the coefficient will be maximum (as a rule, this is the first lag) and evaluate its significance. A prerequisite for solving this problem is the possibility of a representativeness error in the analysis of sample data. Statistical hypothesis is tested: general autocorrelation coefficient is zero(hence, the obtained value of the sample autocorrelation coefficient is a consequence of the manifestation of a random error of representativeness). Alternative hypothesis: the general autocorrelation coefficient is nonzero (therefore, the obtained value of the sample autocorrelation coefficient can be considered as an estimate of the unknown general autocorrelation coefficient from the sampled data). Hypotheses are verified by calculating the Student's t-test and comparing the calculated value with the theoretical one.

Where r- autocorrelation coefficient, σ r Is the standard error of the autocorrelation coefficient.

The error is calculated as follows:

Where n is the number of levels in the series

The theoretical value of the Student's test at a significance level of 0.05 and the number of degrees of freedom of 12 is 2.17

The calculated value of the criterion exceeds the theoretical one (16.69 versus 2.17), therefore, the autocorrelation coefficient at the first lag is considered significant.

The presence of high autocorrelation in combination with the significance of the coefficient gives us the opportunity to consider a regression model of the form

(one type of regression model). This model is called autoregressive and allows you to solve the problem of extrapolation and forecasting.

Practice shows that autocorrelation is often preserved in deviations from the trend. Before proceeding with the calculation of the correlation coefficient for the residuals, it is necessary to check the presence of autocorrelation in them. The tested statistical hypothesis (H0 :) is formulated as follows:

H0: there is no autocorrelation in the analyzed time series.

The most common statistical criterion for estimating autocorrelation in deviations from the trend, is the Durbin - Watson test ( d0), the criterion statistics is determined by the following formula:

,

where random deviations from trend .

The criterion value changes in the range from "0" to "4". At 0< d < 2 - автокорреляция положительная,

if 2< d < 4 – автокорреляция отрицательная.

The proximity of the criterion value to "2" indicates the absence or insignificant autocorrelation. The scores obtained according to the criterion "d" are interval. There are tables of distribution of the values ​​of the Durbin - Watson test, compiled for different levels of significance. The tables are compiled taking into account the number of observations in the time series and the number of variables in the trend equation.

According to the table, in each case, they find bottom ( ) and upper ( ) the boundaries of the criterion. Comparison of the calculated value with tabular is interpreted as follows:

1. > , - H0 - accepted;

2. < , - H0 - rejected;

3. , further research is needed (for example, over a longer time series).

To check the residuals for autocorrelation, you can simply calculate the autocorrelation coefficients for the residuals. This problem is solved similarly to the problem of assessing the autocorrelation of time series. The only difference: the initial data in this case is the residuals for the optimal trend (taken from the reports)

The absence of autocorrelation in the residuals is determined by the value of the coefficient (less than 0.5 - there is no autocorrelation). The solution to this problem additionally confirms the quality of the trend selection.

Time series cross-correlation is a correlation dependence between time series with a given time shift (lag). Attention! The calculation of the cross-correlation coefficients is carried out on the basis of the residuals from the optimal trends for the time series. The need to exclude the trend component of the time series is explained by the fact that when the levels of unidirectional series are correlated, they are significantly distorted (the calculation results are overestimated).

The balances for the two time series are taken from the best trend reports.

The offset (lag) is set by analogy with the autocorrelation problem.

The second difference is the need to consider direct and inverse relationships.

The sequence of specifying the initial data does not matter in this case, since in any case a direct relationship is considered - import to export, and the reverse - export to import, respectively.

The third difference is that no offset is set at zero lag.

The obtained cross-correlation coefficients are used to construct a correlogram

By analogy with solving the autocorrelation problem, it is necessary to assess the significance of the maximum cross-correlation coefficient (as a rule, this is the coefficient at zero lag).

The presence of a high cross-correlation in combination with the significance of the coefficient gives us the opportunity to consider a regression model of the form

(the optimal trend is chosen as a regression model. In this case, linear). Such a model is called a regression model with the inclusion of a time factor) and allows you to solve the problem of extrapolation and forecasting.

Levels of the second dynamic row with a given offset by the lag value

When processing time series, it is necessary to take into account the presence of autocorrelation and autoregressive, at which the values ​​of the next level of the series depend on the previous values.

Autocorrelation- the phenomenon of interconnection between the rows: the initial and the same row shifted relative to the initial position by h points in time.

Autocorrelation can be quantitatively measured using the linear correlation coefficient between the levels of the original time series and the levels of this series, shifted by several steps in time.

The formula for calculating the autocorrelation coefficient is:

This value is called autocorrelation coefficient levels of the first-order series, since it measures the relationship between adjacent levels of the series and.

Similarly, you can determine the autocorrelation coefficients of the second and higher orders. So, the second-order autocorrelation coefficient characterizes the tightness of the connection between the levels and and is determined by the formula:

where

The shift between adjacent levels or shifted by any number of time periods is called time lag . As the lag increases, the number of pairs of values ​​used to calculate the autocorrelation coefficient decreases. It is considered advisable to use the rule that the maximum lag should be no more than to ensure the statistical reliability of the autocorrelation coefficients.

Autocorrelation coefficient properties.

1. The correlation coefficient is constructed by analogy with the linear correlation coefficient and thus characterizes the tightness of only the linear relationship between the current and previous levels of the series. Therefore, the autocorrelation coefficient can be used to judge the presence of a linear (or close to linear) trend. For some time series with a strong non-linear trend (for example, a second-order parabola or exponential), the autocorrelation coefficient of the levels of the original series may approach zero.

2. By the sign of the autocorrelation coefficient it is impossible to draw a conclusion about an increasing or decreasing trend in the levels of the series. Most time series of economic data contain a positive autocorrelation of levels, however, they may have a decreasing trend.

A sequence of autocorrelation coefficients of the levels of the first, second, etc. orders are called autocorrelation function time series. The graph of the dependence of its values ​​on the lag value (of the order of the autocorrelation coefficient) is called correlogram.

The analysis of the autocorrelation function and the correlogram makes it possible to determine the lag at which the autocorrelation is the highest, and, consequently, the lag, at which the relationship between the current and previous levels of the series is the closest, i.e. by analyzing the autocorrelation function and the correlogram, it is possible to reveal the structure of the series.


If the first order autocorrelation coefficient turned out to be the highest, the studied series contains only a trend. If the autocorrelation coefficient of the order turned out to be the highest, then the series contains cyclical fluctuations with a periodicity in points in time. If none of the autocorrelation coefficients is significant, one of two assumptions about the structure of this series can be made: either the series does not contain trends and cyclical fluctuations, or the series contains a strong non-linear trend, which requires additional analysis to be identified. Therefore, the autocorrelation coefficient of levels and the autocorrelation function should be used to identify the presence or absence of a trend component and a cyclical (seasonal) component in a time series.

Example 3.

Suppose there are some conditional data (table 11) on the total amount of goods received at the warehouse of the enterprise.

Table 11 - The total number of goods received at the warehouse.