Coefficient of variation of a random variable formula. Calculation of the coefficient of variation

The square root of the variance is called the standard deviation from the mean, which is calculated as follows:

An elementary algebraic transformation of the standard deviation formula brings it to the following form:

This formula is often more convenient in the practice of calculations.

The mean square deviation, like the mean linear deviation, shows how much, on average, the specific values ​​of the feature deviate from their mean. The standard deviation is always greater than the linear standard deviation. There is such a ratio between them:

Knowing this ratio, it is possible to determine the unknown by the known indicators, for example, but (I calculate a and vice versa. The standard deviation measures the absolute size of the variability of the attribute and is expressed in the same units of measurement as the values ​​of the attribute (rubles, tons, years, etc.). It is the absolute measure of variation.

For alternative signs, such as presence or absence higher education, insurance, variance and standard deviation formulas are as follows:

Let's show the calculation of the standard deviation from the data discrete series, characterizing the distribution of students of one of the faculties of the university by age (Table 6.2).

Table 6.2.

The results of auxiliary calculations are given in columns 2-5 of table. 6.2.

The average age of a student, years, is determined by the formula of the arithmetic weighted average (column 2):

The squares of the deviation of the student's individual age from the average are contained in columns 3-4, and the products of the squares of the deviations by the corresponding frequencies are in column 5.

The variance of the age of students, years, is found by the formula (6.2):

Then o = l / 3.43 1.85 * ode, i.e. each specific value of the student's age deviates from the average by 1.85 years.

The coefficient of variation

In their own way absolute value the standard deviation depends not only on the degree of variation of the trait, but also on the absolute levels of the variants and the mean. Therefore, it is impossible to directly compare the standard deviations of the variational series with different mean levels. To be able to make such a comparison, you need to find specific gravity mean deviation (linear or square) in the arithmetic mean, expressed as a percentage, i.e. calculate relative indices of variation.

Linear coefficient of variation calculated by the formula

The coefficient of variation determined by the following formula:

In the coefficients of variation, not only the incompatibility associated with different units of measurement of the studied attribute is eliminated, but also the incompatibility arising from differences in the value of the arithmetic means. In addition, the indicators of variation characterize the homogeneity of the population. The population is considered homogeneous if the coefficient of variation does not exceed 33%.

According to the table. 6.2 and the calculation results obtained above, we determine the coefficient of variation,%, according to the formula (6.3):

If the coefficient of variation exceeds 33%, then this indicates the heterogeneity of the studied population. The value obtained in our case indicates that the aggregate of students is homogeneous in terms of age. Thus, an important function of the generalized indicators of variation is the assessment of the reliability of the mean. The less c1, a2 and V, the more homogeneous the set of phenomena obtained and the more reliable the obtained average. According to the mathematical statistics"rule of three sigma" in normally distributed or close to them series of deviations from the arithmetic mean, not exceeding ± 3st, occur in 997 cases out of 1000. Thus, knowing NS and a, you can get a general initial idea of ​​the variation series. If, for example, the average wage employee for the company was 25,000 rubles, and a is equal to 100 rubles, then with a probability close to reliability, it can be argued that the wages of the company's employees fluctuate within (25,000 ± 3 x 100) i.e. from 24,700 to 25,300 rubles.

Many people are faced with the variability of the trait under study in individual units of the population, its fluctuation relative to a certain value, that is, with its variation. Here it is something that must be taken into account in order to obtain the most reliable information about the progress of this or that scientific research.

Most researchers, when determining the interval of change in the value of one or another parameter, most often resort to absolute and Among the latter, the coefficient of variation is the most widespread, which, if the studied value is characterized by a normal distribution, is a criterion for the homogeneity of the population. This indicator allows you to determine what degree of dispersion the values ​​of the studied parameter will have, regardless of the scale and unit of measurement.

The coefficient of variation can be calculated by dividing by the arithmetic mean of the variable, expressed as a percentage. The result of this calculation can fall within the range from zero to infinity, increasing as the variation of the feature increases. If the obtained value is less than 33.3%, the variation of the feature is weak. If more - strong. In the latter case, the studied set of data is heterogeneous, it is recognized as atypical, and therefore cannot be a generalizing indicator. Therefore, for this population, it is worth applying other indicators.

It should be noted that the coefficient of variation not only characterizes the homogeneity of a certain population, but is also used as a comparative assessment of it. For example, it is used if there is a need for fluctuations of a particular feature in populations for which the calculated mean value is different. In this case, the scatter of the data obtained does not allow an objective assessment of the acquired value. The coefficient of variation characterizes the relative variability of a variable, and therefore can be a relative measure of fluctuations in the value of the parameter under study.

However, there are some limitations here. In particular, it is possible to assess the degree of fluctuation of parameter values ​​only for a specific feature and if the population has a certain composition. Moreover, the equality of these indicators may indicate both strong and weak variation. This is the case if the signs are different or the studies are carried out on different populations. Such a result is formed under the influence of very objective reasons, and this should be taken into account during the processing of the obtained experimental data.

The coefficient of variation is widely used in various fields of science and technology. In particular, it is actively involved in assessing the fluctuations of parameters in economics and sociology. In this case, the use of the coefficient is made impossible if it is necessary to assess the variability of variables that are capable of changing their sign to the opposite. After all, then, as a result of calculations, incorrect values ​​will be obtained this indicator: either it will be very small or it will have a negative sign. In the latter case, it is worth checking the correctness of the calculations performed.

Thus, we can say that the coefficient of variation is a parameter that will allow you to estimate the degree of variation and relative variability. average size... The use of this indicator allows you to identify the most significant factors, focusing on which will allow you to achieve the goals and solve the necessary tasks.

INTRODUCTION

Methodological instructions for the implementation of practical and laboratory work on statistics contain requirements for their implementation, the procedure for manual calculations and using MS Excel, PPP Statistica.

Part II guidelines characterizes the calculation of the indicators of variation: the range of variation, quartiles and quartile deviation, the mean linear deviation, variance and standard deviation, the coefficients of oscillation, variation, asymmetry, kurtosis and others.

The calculation of the indicators of variation, along with the construction of interval and discrete variation series and the calculation of average values ​​presented in Part I of the guidelines, is of great importance for the analysis of distribution series.

CALCULATION OF VARIATION INDICATORS

Purpose of the work: obtaining practical skills in calculating various indicators (measures) of variation depending on the tasks set by the study.

Work order:

Determine the type and form (simple or weighted) of the indicators of variation.

Formulate conclusions.

An example of calculating the indicators of variation

Determination of the type and form of indicators of variation.

Variation indicators are divided into two groups: absolute and relative. The absolute ones include: the range of variation, quartile deviation, standard linear deviation, variance and standard deviation. Relative indicators are oscillation coefficients, variations, relative linear deviation, etc.

The range of variation (R) is the simplest measure of the variation of a trait and is determined by the following formula:

where is the highest value of the variable attribute;

The smallest value of the variable feature.

Quartile deviation (Q) - used to characterize the variation of a trait in the aggregate. Can be used in place of a swing to avoid the disadvantages of using extreme values.

Quartiles are the values ​​of a feature in a ranked distribution series, chosen in such a way that 25% of the population units will be smaller in size; 25% of units will be enclosed between and; 25% of the units will be enclosed between and, and the remaining 25% are superior.

where is the lower boundary of the interval in which the first quartile is located;

The sum of the accumulated frequencies of the intervals preceding the interval in which the first quartile is located;

The frequency of the interval in which the first quartile is located.

where Me is the median of the series;

the legend is the same as for the quantity.

In symmetric or moderately asymmetric distributions, Q2 / 3. Since the quartile deviation is not affected by the deviations of all values ​​of the attribute, its use should be limited to cases when the determination of the standard deviation is difficult or impossible.

The average linear deviation () is the average of the absolute deviations of the feature variants from their average. It can be calculated using the arithmetic mean formula, both unweighted and weighted, depending on the absence or presence of frequencies in the distribution series.

(6) - unweighted mean linear deviation,

(7) - weighted average linear deviation.

Dispersion () - mean square of deviations of individual values ​​of a feature from their mean. The variance is calculated using simple unweighted and weighted formulas.

(8) - unweighted,

(9) - weighted.

The standard deviation () is the most common measure of variation and is Square root from the variance value.

The range of variation, quartile deviation, mean linear and standard deviation are named values, have the dimension of the averaged feature.

For the purpose of comparing the variability of various characteristics in the same population, or when comparing the variability of the same attribute in several populations, relative indicators of variation are calculated. The basis for comparison is the arithmetic mean. Most often, relative indicators are expressed as a percentage and characterize not only the comparative assessment of variation, but also characterize the homogeneity of the population.

The oscillation coefficient is calculated by the formula:

Linear Relative Deviation (Linear Coefficient of Variation):

(13) or (14)

The coefficient of variation:

The most commonly used indicator of relative variability in statistics is the coefficient of variation. It is used not only for the comparative assessment of variation, but also as a characteristic of the homogeneity of the population. The set is considered homogeneous if the coefficient of variation does not exceed 33% (Efimova MR, Ryabtsev VM General theory of statistics: Textbook M: Finance and Statistics, 1991, p. 105).

To obtain an approximate idea of ​​the shape of the distribution, distribution graphs are plotted (polygon and histogram).

In the practice of statistical research, one has to meet with a variety of distributions. When studying homogeneous populations, we deal, as a rule, with one-vertex distributions. Multi-peaks indicate the heterogeneity of the studied population, the appearance of two or more peaks indicates the need to regroup the data in order to identify more homogeneous groups. Elucidation of the general nature of the distribution involves assessing the degree of its homogeneity, as well as calculating the indicators of asymmetry and kurtosis. Symmetrical is a distribution in which the frequencies of any two variants, equally spaced on both sides of the distribution center, are equal to each other. For symmetric distributions, the arithmetic mean, mode and median are equal. In this regard, the simplest indicator asymmetries based on the ratio of indicators of the center of distribution: the greater the difference between the means, the greater the asymmetry of the series.

For comparative analysis the degree of asymmetry of several distributions, the relative indicator As is calculated:

The value of the As indicator can be positive or negative. A positive value of the indicator indicates the presence of right-sided asymmetry (the right branch is more elongated relative to the maximum ordinate than the left). With right-sided asymmetry between the indicators of the center of distribution, there is a ratio:. A negative sign of the asymmetry indicator indicates the presence of left-sided asymmetry (Figure 1). Between the indicators of the center of distribution in this case, there is such a ratio:.

Figure 1. Distribution: 1 - with right-sided asymmetry; 2 - with left-sided asymmetry.

Another indicator, proposed by the Swedish mathematician Lindberg, is calculated using the formula:

where P is the percentage of those values ​​of the feature that exceed the arithmetic mean in size.

The most accurate and widespread indicator is based on the determination of the central moment of the third order (in a symmetric distribution, its value is equal to zero):

where is the central moment of the third order:

(19) - for ungrouped data;

(20) - for grouped data.

y - standard deviation.

The use of this indicator makes it possible not only to determine the amount of asymmetry, but also to answer the question of the presence or absence of asymmetry in the distribution of a feature in the general population... The assessment of the degree of significance of this indicator is given using the mean square error, which depends on the volume of observations n and is calculated by the formula:

If the ratio is significant, the asymmetry is significant and the distribution of the trait in the general population is not symmetrical. If the attitude, asymmetry is insignificant, its presence can be explained by the influence of various random circumstances.

For symmetric distributions, the indicator is calculated excess(peakedness). Lindbergh proposed the following indicator for assessing kurtosis:

where P is the proportion (%) of the number of options lying in an interval equal to half the standard deviation in one direction or another from the arithmetic mean.

The most accurate indicator is using the fourth-order central moment:

where is the central moment of the fourth moment;

(24) - for ungrouped data;

(25) - for grouped data.

Figure 2 shows two distributions: one is peaked (the amount of kurtosis is positive), the second is flat-topped (the amount of kurtosis is negative). Kurtosis is the fall of the top of the empirical distribution up or down from the top of the normal distribution curve. V normal distribution attitude.

Figure 2. Distribution: 1.4 - normal; 2 - peaked; 3 - flat top

The root mean square error of kurtosis is calculated by the formula:

where n is the number of observations.

If, then the excess is significant, if, then it is insignificant.

Assessment of the significance of the indicators of asymmetry and kurtosis allows us to conclude whether this empirical study can be classified as a normal distribution curve.

Consider the methodology for calculating the indicators of variation.

Table 1. Data on the volume of currency sales of several branches of the Central Bank.

Determine the average volume of currency sales for the aggregate of branches, calculate the absolute and relative indicators of variation.

Let's calculate the range of variation:

R = = 24.3 - 10.2 = 14.1 million rubles.

variation variance oscillation variation asymmetry kurtosis

To determine the deviations of the attribute values ​​from the mean and their squares, we build an auxiliary table:

Table 2. Calculation table

We find the average value using the simple arithmetic mean formula:

Average linear deviation:

Dispersion:

Oscillation coefficient:

The coefficient of variation:

To calculate the indicators of the distribution form, we build an auxiliary table:

Table 3. Calculation table


Table 4. Data on the turnover of enterprises in one of the industries.

Determine the average volume of trade, structural averages, absolute and relative indicators of variation and how much the actual distribution is consistent with the normal (in terms of the distribution form).

To calculate the indicators, we will build an auxiliary table.

Table 5. Calculation table

Range of variation:

We find the average value using the formula of the arithmetic weighted average:

In the interval distribution series, the mode is determined by the formula:

In our case, the mod will be equal to:

In the interval variation series, the median is determined by the formula:

In our case, the median will be:

Quartile deviation:

where and are the first and third quartiles of the distribution, respectively.

Quartiles are determined by the formulas:

Average linear deviation:

Dispersion:

Standard deviation:

Let's calculate the relative indicators of variation.

Oscillation coefficient:

Relative linear deviation:

Relative Quartile Variation Indicator:

The coefficient of variation:

Let's define the indicators of the distribution form:

Formulation of conclusions.

Let us formulate conclusions on the calculated indicators of variation of example 2, which presents interval series distribution of enterprises by volume of turnover, mln. rub.

The range of variation indicates that the difference between the maximum and minimum values ​​is 40 million rubles. The average turnover is 30 million rubles. The most frequently encountered value of the volume of turnover in the considered set of enterprises is 31.4 million rubles, and 50% (40 enterprises) have a turnover of less than 30.5 million rubles, and 50% more.

A quartile deviation of 5 indicates a moderate asymmetry of the distribution, as in symmetric or moderately asymmetric distributions (in the example under consideration).

The average linear and standard deviation show how much, on average, the value of the trait fluctuates in the units of the studied population. So, the average value of the fluctuations in the volume of turnover of enterprises in industries is: by the average linear deviation - 6.5 million rubles. (absolute deviation); by standard deviation - 8.1 million rubles. The square of the deviations of the individual values ​​of the trait from their average value is 65.

The difference between the extreme values ​​of the trait is 33.3% higher than the average value (= 133.3%).

The relative linear deviation (= 21.7%) and the relative indicator of quartile variation (= 16.4%) characterize the homogeneity of the studied population, which is confirmed by the calculated coefficient of variation equal to 27% (V = 27% less than 33%).

According to the calculated indices of asymmetry and kurtosis, it can be concluded that the distribution is flat-topped (Ex< 0) и наблюдается левосторонняя асимметрия (As < 0). Асимметрия и эксцесс являются несущественными.

Variation of a feature determined various factors, some of these factors can be distinguished if the statistical population is divided into groups according to a certain criterion. Then, along with the study of the variation of the trait in the population as a whole, it is possible to study the variation for each of its constituent groups and between these groups. In the simple case, when the population is divided into groups according to one factor, the study of variation is achieved by calculating and analyzing three types of variance: total, intergroup and intragroup.

Empirical coefficient of determination

Empirical coefficient of determination widely used in statistical analysis and is an indicator representing the share of intergroup variance in the effective trait and characterizes the strength of the influence of the grouping trait on the formation of general variation. It can be calculated using the formula:

Shows the proportion of variation of the effective attribute y under the influence of the factor attribute x, it is associated with the correlation coefficient by a quadratic dependence. In the absence of communication, the empirical coefficient of determination is zero, and at functional communication- unit.

For example, when the dependence of the labor productivity of workers on their qualifications is studied, the coefficient of determination is 0.7, then 70% of the variation in labor productivity of workers is due to differences in their qualifications and 30% is due to the influence of other factors.

The empirical correlation is the square root of the coefficient of determination. The ratio shows the tightness of the relationship between the grouping and the productive characteristics. The empirical correlation ratio takes values ​​from -1 to 1. If there is no connection, then the correlation ratio is equal to zero, i.e. all group means are equal and there is no intergroup variation. This means that the grouping trait does not affect the formation of the general variation.

If the relationship is functional, then the correlation ratio is equal to one. In this case, the variance of the group means is total variance, i.e. there is no intra-group variation. This means that the grouping characteristic completely determines the variation of the effective characteristic.

How closer meaning the correlation ratio to unity, the stronger and closer to the functional dependence the relationship between the signs. For a qualitative assessment of the strength of the connection based on the indicator of the empirical correlation coefficient, you can use the Chaddock ratio.

Chaddock ratio

  • The connection is very close - the correlation coefficient is in the range 0.9 - 0.99
  • Close connection - Rxy = 0.7 - 0.9
  • The connection is noticeable - Rxy = 0.5 - 0.7
  • Moderate relationship - Rxy = 0.3 - 0.5
  • The connection is weak - Rxy = 0.1 - 0.3

Variation- this is the discrepancy between the values ​​of the same statistical quantity for different objects due to the peculiarities of their own development, as well as differences in the conditions in which they are found. The variation is objective and helps to understand the essence of the phenomenon under study. If the average smoothes out individual differences, then the variation, on the contrary, emphasizes them, establishing the typicality or non-typicality of the found average for a particular statistical population. Thus, we can conclude about the quality of the selected statistical data.

Variation is measured using relative quantities called coefficients of variation and defined as the ratio of the average deviation to the average. Since the average deviation can be determined by linear and quadratic methods, the coefficients of variation can also be corresponding. Therefore, the coefficients of variation must be determined by the formulas

linear; (1.28)

quadratic. (1.29) The values ​​of the coefficient of variation vary from 0 to 1, and the closer it is to zero, the more typical is the found average value for the studied statistical population, and hence the better the statistical data are selected. In this case, the criterion value of the coefficient of variation is 1/3.

That is, the average value is considered typical for a given population at λ 0.333 or at ν 0.333. Otherwise, the average is not typical and it is necessary to revise the statistical population in order to include more objective statistical values ​​in it.

Typically, the quadratic coefficient of variation is somewhat (by about 25%) greater than the linear one, calculated from the same data. This means that the case is possible when λ 0.333 and ν 0.333, then it is necessary to take the average of these coefficients and, by its value, draw a final conclusion about the non / typicality of the found average.

With the help of the linear coefficient of variation, the fundamental conclusion about the typical or non-typical of the mean value can be obtained easier and faster than using the quadratic one. However, the quadratic coefficient is used more often because there are several ways to calculate variance.

There is also a significant drawback to this method of evaluating variation. Indeed, let, for example, the initial population of workers with an average length of service of 15 years, with a standard deviation σ = 10 years, “aged” by another 15 years. Now = 30 years, and standard deviation is still equal to 10. The aggregate, which was previously heterogeneous (10/15 * 100 = 66.7%), turns out to be quite homogeneous over time (10/30 * 100 = 33.3%).

Therefore, additional analysis of the statistical population is possible using oscillation coefficient determined by the formula

where R- the range of variation in the form of the difference between the largest and smallest values ​​in the aggregate of statistical values. That is

R = Xmax -Xmin,(1.31)

where Xmax and Xmin are the maximum and minimum values ​​combined.

By ordering the statistical quantities in the aggregate, grouping intervals are formed. Then under the notation ∆X the range of the interval is understood, and the average interval value is denoted CHI... In the case of focusing only on the quadratic coefficient of variation, different methods for determining the variance can be used.