Standard deviation. Calculating Standard Deviation in Microsoft Excel

Dispersion. The average standard deviation

Dispersion is the arithmetic mean of the squared deviations of each feature value from the total mean. Depending on the source data, the variance can be unweighted (simple) or weighted.

The dispersion is calculated using the following formulas:

for ungrouped data

for grouped data

The procedure for calculating the weighted variance:

1. determine the arithmetic weighted average

2. Variant deviations from the mean are determined

3. square the deviation of each option from the mean

4. multiply squared deviations by weights (frequencies)

5. summarize the received works

6. the resulting amount is divided by the sum of the weights

The formula for determining the variance can be converted to the following formula:

- simple

The procedure for calculating the variance is simple:

1. determine the arithmetic mean

2. square the arithmetic mean

3. square each row option

4. find the sum of squares option

5. divide the sum of the squares of the option by their number, i.e. determine the mean square

6. determine the difference between the mean square of the feature and the square of the mean

Also the formula for determining the weighted variance can be converted to the following formula:

those. the variance is equal to the difference between the mean of the squares of the feature values ​​and the square of the arithmetic mean. When using the transformed formula, it is excluded additional procedure by calculating the deviations of the individual values ​​of the attribute from x and eliminating the error in the calculation associated with the rounding of deviations

The dispersion has a number of properties, some of which make it easier to calculate:

1) the dispersion of a constant value is zero;

2) if all variants of the attribute values ​​are reduced by the same number, then the variance will not decrease;

3) if all variants of the attribute values ​​are reduced by the same number of times (times), then the variance will decrease by a factor of

Standard deviation S- is the square root of the variance:

For ungrouped data:

;

· for variation series:

The range of variation, mean linear and mean square deviation are named quantities. They have the same units of measure as the individual characteristic values.

Dispersion and standard deviation are the most widely used measures of variation. This is explained by the fact that they are included in most theorems of probability theory, which serves as the foundation mathematical statistics. In addition, the variance can be decomposed into constituent elements, allowing to assess the impact various factors that determine the variation of the trait.

The calculation of variation indicators for banks grouped by profit is shown in the table.

Profit, million rubles Number of banks calculated indicators
3,7 - 4,6 (-) 4,15 8,30 -1,935 3,870 7,489
4,6 - 5,5 5,05 20,20 - 1,035 4,140 4,285
5,5 - 6,4 5,95 35,70 - 0,135 0,810 0,109
6,4 - 7,3 6,85 34,25 +0,765 3,825 2,926
7,3 - 8,2 7,75 23,25 +1,665 4,995 8,317
Total: 121,70 17,640 23,126

The mean linear and mean square deviations show how much the value of the attribute fluctuates on average for the units and the population under study. So, in this case, the average value of the fluctuation in the amount of profit is: according to the average linear deviation, 0.882 million rubles; according to the standard deviation - 1.075 million rubles. The standard deviation is always greater than the average linear deviation. If the distribution of the trait is close to normal, then there is a relationship between S and d: S=1.25d, or d=0.8S. The standard deviation shows how the bulk of the population units are located relative to the arithmetic mean. Regardless of the form of distribution, 75 values ​​of the attribute fall into the x 2S interval, and at least 89 of all values ​​fall into the x 3S interval (P.L. Chebyshev's theorem).

When statistical testing of hypotheses, when measuring a linear relationship between random variables.

Standard deviation:

Standard deviation(standard deviation estimate random variable The floor, the walls around us and the ceiling x relative to its mathematical expectation based on an unbiased estimate of its variance):

where - variance; - The floor, the walls around us and the ceiling, i-th sample element; - sample size; - arithmetic mean of the sample:

It should be noted that both estimates are biased. In the general case, it is impossible to construct an unbiased estimate. However, an estimate based on an unbiased variance estimate is consistent.

three sigma rule

three sigma rule() - almost all values ​​of a normally distributed random variable lie in the interval . More strictly - with no less than 99.7% certainty, the value of a normally distributed random variable lies in the specified interval (provided that the value is true, and not obtained as a result of sample processing).

If the true value is unknown, then you should use not, but the floor, the walls around us and the ceiling, s. In this way, rule of three sigma is converted to the rule of three Floor, walls around us and ceiling, s .

Interpretation of the value of the standard deviation

A large value of the standard deviation shows a large spread of values ​​in the presented set with the average value of the set; a small value, respectively, indicates that the values ​​in the set are grouped around the average value.

For example, we have three number sets: (0, 0, 14, 14), (0, 6, 8, 14) and (6, 6, 8, 8). All three sets have mean values ​​of 7 and standard deviations of 7, 5, and 1, respectively. The last set has a small standard deviation because the values ​​in the set are clustered around the mean; the first set has the largest value of the standard deviation - the values ​​within the set strongly diverge from the average value.

In a general sense, the standard deviation can be considered a measure of uncertainty. For example, in physics, the standard deviation is used to determine the error of a series of successive measurements of some quantity. This value is very important for determining the plausibility of the phenomenon under study in comparison with the value predicted by the theory: if the mean value of the measurements differs greatly from the values ​​predicted by the theory (large standard deviation), then the obtained values ​​or the method of obtaining them should be rechecked.

Practical use

In practice, the standard deviation allows you to determine how much the values ​​in the set can differ from the average value.

Climate

Suppose there are two cities with the same average daily maximum temperature, but one is located on the coast and the other is inland. Coastal cities are known to have many different daily maximum temperatures less than inland cities. Therefore, the standard deviation of the maximum daily temperatures in the coastal city will be less than in the second city, despite the fact that the average value of this value is the same for them, which in practice means that the probability that the maximum air temperature of each particular day of the year will be stronger differ from the average value, higher for a city located inside the continent.

Sport

Let's assume there are several football teams, which are evaluated by a certain set of parameters, for example, the number of goals scored and conceded, scoring chances, etc. It is most likely that the best team in this group will have the best values ​​for more parameters. The smaller the team's standard deviation for each of the presented parameters, the more predictable the team's result is, such teams are balanced. On the other hand, the team with great value standard deviation is difficult to predict the result, which in turn is explained by imbalance, for example, strong defense, but weak attack.

The use of the standard deviation of the team's parameters allows one to predict the result of the match between two teams to some extent, evaluating the strengths and weaknesses commands, and hence the chosen methods of struggle.

Technical analysis

see also

Literature

* Borovikov, V. STATISTICS. The art of computer data analysis: For professionals / V. Borovikov. - St. Petersburg. : Peter, 2003. - 688 p. - ISBN 5-272-00078-1.

When statistical testing of hypotheses, when measuring a linear relationship between random variables.

Standard deviation:

Standard deviation(an estimate of the standard deviation of the random variable Floor, walls around us and the ceiling, x relative to its mathematical expectation based on an unbiased estimate of its variance):

where - variance; - The floor, the walls around us and the ceiling, i-th sample element; - sample size; - arithmetic mean of the sample:

It should be noted that both estimates are biased. In the general case, it is impossible to construct an unbiased estimate. However, an estimate based on an unbiased variance estimate is consistent.

three sigma rule

three sigma rule() - almost all values ​​of a normally distributed random variable lie in the interval . More strictly - with no less than 99.7% certainty, the value of a normally distributed random variable lies in the specified interval (provided that the value is true, and not obtained as a result of sample processing).

If the true value is unknown, then you should use not, but the floor, the walls around us and the ceiling, s. Thus, the rule of three sigma is translated into the rule of three Floor, walls around us and the ceiling, s .

Interpretation of the value of the standard deviation

A large value of the standard deviation shows a large spread of values ​​in the presented set with the average value of the set; a small value, respectively, indicates that the values ​​in the set are grouped around the average value.

For example, we have three number sets: (0, 0, 14, 14), (0, 6, 8, 14) and (6, 6, 8, 8). All three sets have mean values ​​of 7 and standard deviations of 7, 5, and 1, respectively. The last set has a small standard deviation because the values ​​in the set are clustered around the mean; the first set has the largest value of the standard deviation - the values ​​within the set strongly diverge from the average value.

In a general sense, the standard deviation can be considered a measure of uncertainty. For example, in physics, the standard deviation is used to determine the error of a series of successive measurements of some quantity. This value is very important for determining the plausibility of the phenomenon under study in comparison with the value predicted by the theory: if the mean value of the measurements differs greatly from the values ​​predicted by the theory (large standard deviation), then the obtained values ​​or the method of obtaining them should be rechecked.

Practical use

In practice, the standard deviation allows you to determine how much the values ​​in the set can differ from the average value.

Climate

Suppose there are two cities with the same average daily maximum temperature, but one is located on the coast and the other is inland. Coastal cities are known to have many different daily maximum temperatures less than inland cities. Therefore, the standard deviation of the maximum daily temperatures in the coastal city will be less than in the second city, despite the fact that the average value of this value is the same for them, which in practice means that the probability that the maximum air temperature of each particular day of the year will be stronger differ from the average value, higher for a city located inside the continent.

Sport

Let's assume that there are several football teams that are ranked according to some set of parameters, for example, the number of goals scored and conceded, chances to score, etc. It is most likely that the best team in this group will have the best values ​​in more parameters. The smaller the team's standard deviation for each of the presented parameters, the more predictable the team's result is, such teams are balanced. On the other hand, a team with a large standard deviation has a hard time predicting the result, which in turn is explained by an imbalance, for example, a strong defense but a weak attack.

The use of the standard deviation of the parameters of the team allows one to predict the result of the match between two teams to some extent, evaluating the strengths and weaknesses of the teams, and hence the chosen methods of struggle.

Technical analysis

see also

Literature

* Borovikov, V. STATISTICS. The art of computer data analysis: For professionals / V. Borovikov. - St. Petersburg. : Peter, 2003. - 688 p. - ISBN 5-272-00078-1.

Root mean square or standard deviation is a statistical indicator that evaluates the amount of fluctuation of a numerical sample around its mean value. Almost always, the bulk of the values ​​are distributed within plus or minus one standard deviation from the mean.

Definition

The standard deviation is the square root of the arithmetic mean of the sum of the squared deviations from the mean. Strictly and mathematically, but absolutely incomprehensible. This is a verbal description of the calculation formula standard deviation, but to understand the meaning of this statistical term, let's deal with everything in order.

Imagine a shooting range, a target and an arrow. The sniper shoots at a standard target, where hitting the center gives 10 points, depending on the distance from the center, the number of points decreases, and hitting the outer areas gives only 1 point. Each shooter's shot is a random integer value from 1 to 10. A bullet-riddled target is a perfect illustration of the distribution of a random variable.

Expected value

Our novice shooter had been practicing shooting for a long time and noticed that he hit different meanings with a certain probability. Let's say, based on a large number of shots, he found out that he hits 10 with a probability of 15%. The rest of the values ​​got their probabilities:

  • 9 - 25 %;
  • 8 - 20 %;
  • 7 - 15 %;
  • 6 - 15 %;
  • 5 - 5 %;
  • 4 - 5 %.

Now he's getting ready to fire another shot. What value is he most likely to hit? It will help us to answer this question. expected value. Knowing all these probabilities, we can determine the most likely outcome of the shot. The formula for calculating the mathematical expectation is quite simple. Let's denote the value of the shot as C, and the probability as p. The mathematical expectation will be equal to the sum of the product of the corresponding values ​​and their probabilities:

Let's define the expectation for our example:

  • M = 10 × 0.15 + 9 × 0.25 + 8 × 0.2 + 7 × 0.15 + 6 × 0.15 + 5 × 0.05 + 4 × 0.05
  • M=7.75

So, it is most likely that the shooter will hit the zone that gives 7 points. This zone will be the most shot through, which is an excellent result of the most frequent hit. For any random variable, the expected value means the most occurring value or the center of all values.

Dispersion

Dispersion is another statistical indicator that illustrates the spread of a value for us. Our target is densely riddled with bullets, and the dispersion allows us to express this parameter numerically. If the mathematical expectation shows the center of the shots, then the variance is their spread. In essence, variance means the mathematical expectation of deviations of values ​​from the expected value, that is, the average square of deviations. Each value is squared so that the deviations are only positive and do not cancel each other out in case same numbers with opposite signs.

D[X] = M − (M[X]) 2

Let's calculate the shot spread for our case:

  • M = 10 2 × 0.15 + 9 2 × 0.25 + 8 2 × 0.2 + 7 2 × 0.15 + 6 2 × 0.15 + 5 2 × 0.05 + 4 2 × 0.05
  • M=62.85
  • D[X] = M − (M[X]) 2 = 62.85 − (7.75) 2 = 2.78

So our deviation is 2.78. This means that from the area on the target with a value of 7.75, the bullet holes are scattered by 2.78 points. However, the dispersion value is not used in its pure form - as a result, we get the square of the value, in our example it is a square score, and in other cases it can be square kilograms or square dollars. Dispersion as a square value is not informative, therefore it is an intermediate indicator for determining the standard deviation - the hero of our article.

standard deviation

The standard deviation, which is the square root of the variance, is used to convert the variance into meaningful points, kilograms, or dollars. Let's calculate it for our example:

S = sqrt(D) = sqrt(2.78) = 1.667

We have received points and now we can use them to connect with mathematical expectation. The most likely outcome of the shot in this case would be expressed as 7.75 plus or minus 1.667. This is enough for an answer, but we can also say that it is almost certain that the shooter will hit the target area between 6.08 and 9.41.

Standard deviation or sigma is an informative indicator that illustrates the spread of a value about its center. The larger the sigma, the more scatter the sample shows. This is a well-studied coefficient and for normal distribution known entertaining rule of three sigma. 99.7% found to be OK distributed quantity lie within plus or minus three sigma of the arithmetic mean.

Let's look at an example

Currency pair volatility

It is known that the methods of mathematical statistics are widely used in the foreign exchange market. Many trading terminals have built-in tools for calculating the volatility of an asset, which demonstrates a measure of the price volatility of a currency pair. Of course, financial markets have their own specifics for calculating volatility, such as the opening and closing prices of stock exchanges, but as an example, we can calculate the sigma for the last seven daily candles and roughly estimate the weekly volatility.

The most volatile asset in the Forex market is considered to be the pound/yen currency pair. Let theoretically during the week the closing price of the Tokyo Stock Exchange take the following values:

145, 147, 146, 150, 152, 149, 148.

We enter this data into the calculator and calculate the sigma equal to 2.23. This means that on average the Japanese yen exchange rate changed by 2.23 yen daily. If everything was so wonderful, traders would earn millions on such movements.

Conclusion

The standard deviation is used in statistical analysis numeric samples. This is a useful factor in estimating the scatter of the data, since two sets with seemingly the same mean can be completely different in their scatter. Use our small sample sigma calculator.

It is defined as a generalizing characteristic of the size of the variation of a trait in the aggregate. It is equal to the square root of the mean square of the deviations of the individual values ​​of the feature from the arithmetic mean, i.e. the root of and can be found like this:

1. For the primary row:

2. For a variation series:

The transformation of the standard deviation formula leads it to a form more convenient for practical calculations:

Standard deviation determines how much, on average, specific options deviate from their average value, and besides, it is an absolute measure of the trait fluctuation and is expressed in the same units as the options, and therefore is well interpreted.

Examples of finding the standard deviation: ,

For alternative features, the formula for the standard deviation looks like this:

where p is the proportion of units in the population that have a certain attribute;

q - the proportion of units that do not have this feature.

The concept of mean linear deviation

Average linear deviation defined as the arithmetic mean absolute values deviations of individual options from .

1. For the primary row:

2. For a variation series:

where the sum of n is the sum of the frequencies of the variation series.

An example of finding the average linear deviation:

The advantage of the mean absolute deviation as a measure of dispersion over the range of variation is obvious, since this measure is based on taking into account all possible deviations. But this indicator has significant drawbacks. Arbitrary rejection of algebraic signs of deviations can lead to the fact that the mathematical properties of this indicator are far from elementary. This greatly complicates the use of the mean absolute deviation in solving problems related to probabilistic calculations.

Therefore, the average linear deviation as a measure of the variation of a feature is rarely used in statistical practice, namely when the summation of indicators without taking into account the signs makes economic sense. With its help, for example, the turnover of foreign trade, the composition of employees, the rhythm of production, etc. are analyzed.

root mean square

RMS applied, for example, to calculate the average size of the sides of n square sections, the average diameters of trunks, pipes, etc. It is divided into two types.

The root mean square is simple. If, when replacing individual values ​​of a trait with an average value, it is necessary to keep the sum of squares of the original values ​​unchanged, then the average will be a quadratic average.

She is square root from the quotient of dividing the sum of squares of individual feature values ​​by their number:

The mean square weighted is calculated by the formula:

where f is a sign of weight.

Average cubic

Average cubic applied, for example, when defining middle length sides and cubes. It is divided into two types.
Average cubic simple:

When calculating the mean values ​​and dispersion in the interval distribution series, the true values ​​of the attribute are replaced by the central values ​​of the intervals, which are different from the arithmetic mean of the values ​​included in the interval. This gives rise to systematic error when calculating the variance. V.F. Sheppard determined that error in variance calculation, caused by applying the grouped data, is 1/12 of the square of the magnitude of the interval, both upward and downward in the magnitude of the variance.

Sheppard Amendment should be used if the distribution is close to normal, refers to a feature with a continuous nature of variation, built on a significant amount of initial data (n> 500). However, based on the fact that in a number of cases both errors, acting in different directions, compensate each other, it is sometimes possible to refuse to introduce amendments.

The smaller the variance and standard deviation, the more homogeneous the population and the more typical the average will be.
In the practice of statistics, it often becomes necessary to compare variations of various features. For example, it is of great interest to compare variations in the age of workers and their qualifications, length of service and size wages, cost and profit, length of service and labor productivity, etc. For such comparisons, indicators of the absolute variability of characteristics are unsuitable: it is impossible to compare the variability of work experience, expressed in years, with the variation of wages, expressed in rubles.

To carry out such comparisons, as well as comparisons of the fluctuation of the same attribute in several populations with different arithmetic mean, a relative indicator of variation is used - the coefficient of variation.

Structural averages

To characterize the central trend in statistical distributions it is often rational to use, together with the arithmetic mean, a certain value of the attribute X, which, due to certain features of its location in the distribution series, can characterize its level.

This is especially important when the extreme values ​​of the feature in the distribution series have fuzzy boundaries. Concerning precise definition the arithmetic mean, as a rule, is impossible or very difficult. In such cases, the average level can be determined by taking, for example, the value of the feature that is located in the middle of the frequency series or that occurs most often in the current series.

Such values ​​depend only on the nature of the frequencies, i.e., on the structure of the distribution. They are typical in terms of location in the frequency series, therefore such values ​​are considered as characteristics of the distribution center and therefore have been defined as structural averages. They are used to study the internal structure and structure of the series of distribution of attribute values. These indicators include .