Average quadratic deviation. Calculation of the Middle Quadratic Deviation in Microsoft Excel

Dispersion. Average quadratic deviation

Dispersion- This is the average arithmetic squares of deviations of each sign of the feature from the total average. Depending on the initial data, the dispersion may be unbelievable (simple) or weighted.

The dispersion is calculated according to the following formulas:

· For non-mapped data

· For grouped data

The procedure for calculating the dispersion weighted:

1. Determine the average arithmetic weighted

2. Definitions of the variant from the average

3. The deviation of each option from the average

4. Multiple the squares for weight deviations (frequencies)

5. summarize the works

6. The amount obtained is divided into the amount of scales

The formula for determining the dispersion can be converted to the following formula:

- Simple

The procedure for calculating the dispersion is simple:

1. Determine the average arithmetic

2. Earl in square average arithmetic

3. Early each variant of the series

4. Find the sum of the squares option

5. Make the sum of the squares option for their number, i.e. Determine the average square

6. Determine the difference between the middle square of the feature and square square

Also, the formula for determining the dispersion of the weighted can be transformed into the following formula:

those. The dispersion is equal to the difference in the middle of the squares of the signs and square of the middle arithmetic. When using the transformed formula is excluded additional procedure By calculating deviations of individual values \u200b\u200bof the feature from x and an error is eliminated in the calculation associated with rounding deviations

The dispersion has a number of properties, some of them allow you to simplify its calculations:

1) the dispersion of a constant value is zero;

2) if all variants of the sign values \u200b\u200bare reduced to the same number, then the dispersion will not decrease;

3) If all the options for signs of sign are reduced into the same number of times (times), the dispersion will decrease at times

Average quadratic deviation s - It is a square root from the dispersion:

· For non-mapped data:

;

· For variational series:

The variation variation, the average linear and secondary quadratic deviation are the values \u200b\u200bnamed. They have the same units of measure as individual signs of the feature.

Dispersion and average quadratic deviation of the most widely used variation indicators. It is explained by the fact that they enter most the theorems of the theory of probability that serves the foundation mathematical statistics. In addition, the dispersion can be decomposed on composite elementsallowing to evaluate various factorscaused by a variation of a sign.

The calculation of the indicators of variations for banks grouped by the size of the profit is shown in the table.

Profit size, million rubles. Number of banks Estimated indicators
3,7 - 4,6 (-) 4,15 8,30 -1,935 3,870 7,489
4,6 - 5,5 5,05 20,20 - 1,035 4,140 4,285
5,5 - 6,4 5,95 35,70 - 0,135 0,810 0,109
6,4 - 7,3 6,85 34,25 +0,765 3,825 2,926
7,3 - 8,2 7,75 23,25 +1,665 4,995 8,317
TOTAL: 121,70 17,640 23,126

The average linear and secondary quadratic deviation is shown by how much the value of the sign of the units and the total totality fluctuate. So, in this case, the average amount of the amount of profit amounts is: on the average linear deflection of 0.882 million rubles; on average quadratic deviation - 1.075 million rubles. The average quadratic deviation is always larger than the average linear deviation. If the distribution of the feature, close to normal, then between S and D there is a relationship: S \u003d 1.25d, or d \u003d 0.8s. The average quadratic deviation shows how the main mass of units of the aggregate relative to the average arithmetic. Regardless of the shape of the distribution of 75, the signs of the feature fall into the interval x 2S, and at least 89 of all values, the x 3S interval (P. Lebyshev Theorem) falls.

With statistical verification of hypotheses, when measuring the linear relationship between random values.

R mean-square deviation:

Standard deviation (Assessment of the RMS Deviation random variable Paul, walls around us and ceiling, x. Regarding its mathematical expectation based on an uncomprised evaluation of its dispersion):

where - dispersion; - Paul, walls around us and ceiling, i. - Element sampling; - the size of the sample; - Average arithmetic sample:

It should be noted that both estimates are offset. In general, an uncompensated assessment is impossible. However, the assessment based on the estimated dispersion is wealthy.

Rule three sigm

Rule three sigm () - Almost all values \u200b\u200bof the normally distributed random variable lie in the interval. More strictly - not less than 99.7% reliability The value of a normal distributed random variable lies at the specified interval (provided that the value is true, and not obtained as a result of the sample processing).

If the true value is unknown, it should be used not, and the floor, the walls around us and the ceiling, s. . In this way, treja rule Sigm is converted into the rule of three floors, the walls around us and the ceiling, s. .

Interpretation of the size of the standard deviation

The large value of the rms deviation shows a large variation of the values \u200b\u200bin the presented set from the average value of the set; A small value, respectively, indicates that the values \u200b\u200bin the set are grouped around the average value.

For example, we have three numeric sets: (0, 0, 14, 14), (0, 6, 8, 14) and (6, 6, 8, 8). In all three sets, the average values \u200b\u200bare 7, and the mean square deviations, respectively, are 7, 5 and 1. In the last set, the root-mean-square deviation is small, since the values \u200b\u200bin the set are grouped around the average value; At the first set, the largest value of the rms deviation is the values \u200b\u200binside the set strongly diverge with the average value.

In the general sense, the standard deviation can be considered a measure of uncertainty. For example, in physics, the regulatory deviation is used to determine the error of a series of consecutive measurements of any value. This value is very important to determine the believability of the studied phenomenon in comparison with the predicted theory value: if the average measurement value is very different from the predicted theory of values \u200b\u200b(the large value of the range of mean dialing), then the obtained values \u200b\u200bor the method of obtaining them should be reheaved.

Practical use

In practice, the standard deviation allows you to determine how much values \u200b\u200bin the set may differ from the average value.

Climate

Suppose there are two cities with the same average maximum daytime temperature, but one is located on the coast, and the other inside the continent. It is known that in the cities located on the coast, many different maximum daily temperatures are smaller than the cities located inside the continent. Therefore, the riconductic deviation of the maximum daytime temperatures at the coastal city will be less than the second city, despite the fact that the average value of this value they have the same, which in practice means that the likelihood that the maximum air temperature of each particular day will be stronger Different from average, higher at the city, located inside the continent.

Sport

Suppose there are several football teamswhich are estimated at a certain set of parameters, for example, the number of scored and missed heads, scoring moments, etc. It is most likely that the best team in this group will have the best values \u200b\u200bon more parameters. The smaller the command of the RMS deviation for each of the parameters presented, the predictable command is the result of the command, such commands are balanced. On the other hand, the team with large meaning The standard deviation is difficult to predict the result, which in turn is due to an imbalance, for example, strong protectionbut a weak attack.

The use of the standard deviation of the command parameters allows to one extent to predict the result of the match of two teams, evaluating the strong and weak sides teams, which means that the elected ways of struggle.

Technical analysis

see also

Literature

* Borovikov, V. Statistica. The art of data analysis on a computer: for professionals / V. Borovikov. - St. Petersburg. : Peter, 2003. - 688 p. - ISBN 5-272-00078-1.

With statistical verification of hypotheses, when measuring the linear relationship between random values.

R mean-square deviation:

Standard deviation (Assessment of the Random Random Floor Deviation, the walls around us and the ceiling, x. Regarding its mathematical expectation based on an uncomprised evaluation of its dispersion):

where - dispersion; - Paul, walls around us and ceiling, i. - Element sampling; - the size of the sample; - Average arithmetic sample:

It should be noted that both estimates are offset. In general, an uncompensated assessment is impossible. However, the assessment based on the estimated dispersion is wealthy.

Rule three sigm

Rule three sigm () - Almost all values \u200b\u200bof the normally distributed random variable lie in the interval. More strictly - not less than 99.7% reliability The value of a normal distributed random variable lies at the specified interval (provided that the value is true, and not obtained as a result of the sample processing).

If the true value is unknown, it should be used not, and the floor, the walls around us and the ceiling, s. . Thus, the rule of three sigms is converted into a three-foot rule, the walls around us and the ceiling, s. .

Interpretation of the size of the standard deviation

The large value of the rms deviation shows a large variation of the values \u200b\u200bin the presented set from the average value of the set; A small value, respectively, indicates that the values \u200b\u200bin the set are grouped around the average value.

For example, we have three numeric sets: (0, 0, 14, 14), (0, 6, 8, 14) and (6, 6, 8, 8). In all three sets, the average values \u200b\u200bare 7, and the mean square deviations, respectively, are 7, 5 and 1. In the last set, the root-mean-square deviation is small, since the values \u200b\u200bin the set are grouped around the average value; At the first set, the largest value of the rms deviation is the values \u200b\u200binside the set strongly diverge with the average value.

In the general sense, the standard deviation can be considered a measure of uncertainty. For example, in physics, the regulatory deviation is used to determine the error of a series of consecutive measurements of any value. This value is very important to determine the believability of the studied phenomenon in comparison with the predicted theory value: if the average measurement value is very different from the predicted theory of values \u200b\u200b(the large value of the range of mean dialing), then the obtained values \u200b\u200bor the method of obtaining them should be reheaved.

Practical use

In practice, the standard deviation allows you to determine how much values \u200b\u200bin the set may differ from the average value.

Climate

Suppose there are two cities with the same average maximum daytime temperature, but one is located on the coast, and the other inside the continent. It is known that in the cities located on the coast, many different maximum daily temperatures are smaller than the cities located inside the continent. Therefore, the riconductic deviation of the maximum daytime temperatures at the coastal city will be less than the second city, despite the fact that the average value of this value they have the same, which in practice means that the likelihood that the maximum air temperature of each particular day will be stronger Different from average, higher at the city, located inside the continent.

Sport

Suppose there are several football teams that are estimated at a certain set of parameters, for example, the number of scored and missed heads, scoring moments, etc. It is most likely that the best command in this group will have better values \u200b\u200bfor more parameters. The smaller the command of the RMS deviation for each of the parameters presented, the predictable command is the result of the command, such commands are balanced. On the other hand, the team with a large mean value of the standard deviation is difficult to predict the result, which in turn is due to an imbalance, for example, strong protection, but a weak attack.

The use of the standard deviation of the command parameters allows you to some extent to predict the result of the match of two teams, evaluating the strengths and weaknesses of the teams, and therefore selectable ways of struggle.

Technical analysis

see also

Literature

* Borovikov, V. Statistica. The art of data analysis on a computer: for professionals / V. Borovikov. - St. Petersburg. : Peter, 2003. - 688 p. - ISBN 5-272-00078-1.

The rms or standard deviation is a statistical indicator that estimates the amount of oscillations of the numerical sample around its average value. Almost always the bulk is distributed in the limit plus-minus one standard deviation from the average value.

Definition

The standard deviation is a square root from the average arithmetic value of the sum of the squares of deviations from the average value. Strictly and mathematically, but absolutely incomprehensible. This is a verbal description of the calculation formula standard deviationBut to understand the meaning of this statistical term, let's understand everything in order.

Imagine a tire, target and arrow. Sniper shoots a standard target, where the entrance to the center gives 10 points, depending on the removal from the center, the number of points is reduced, and the fall in the extreme region gives only 1 score. Each shot of an arrow is a random whole value from 1 to 10. The targeted target with bullets is an excellent illustration of a random variable.

Expected value

Our novice shooter was practiced for a long time in shooting and noticed that he enters different values With a certain probability. Suppose, on the basis of a large number of shots, he found out that it falls in 10 with a probability of 15%. The remaining values \u200b\u200breceived their probabilities:

  • 9 - 25 %;
  • 8 - 20 %;
  • 7 - 15 %;
  • 6 - 15 %;
  • 5 - 5 %;
  • 4 - 5 %.

Now he is preparing to make another shot. What value does it choose with the greatest probability? Reply to this question will help us expected value. Knowing all these probabilities, we can determine the most likely shot of a shot. The formula for calculating the mathematical expectation is quite simple. Denote the value of the shot as C, and the probability of as p. The mathematical expectation will equal to the amount of the product of the corresponding values \u200b\u200band their probabilities:

We define a matchmaker for our example:

  • M \u003d 10 × 0.15 + 9 × 0.25 + 8 × 0,2 + 7 × 0.15 + 6 × 0.15 + 5 × 0.05 + 4 × 0.05
  • M \u003d 7.75

So, it is most likely that the shooter will fall into the zone that gives 7 points. This zone will be the most rustled, which is an excellent result of the most frequent hit. For any random variable, the matching indicator means the most encountered or center of all values.

Dispersion

Dispersion is another statistical indicator illustrating us the variation of the magnitude. Our target is thickly exhausted by bullets, and the dispersion allows you to express this parameter numerically. If the mathematical expectation demonstrates the center of shots, then the dispersion is their scatter. In fact, the dispersion means the mathematical expectation of deviations of the values \u200b\u200bfrom the matchmaker, that is, the average square of deviations. Each value is built into the square so that the deviations were only positive and did not destroy each other in case identical numbers With opposite signs.

D [x] \u003d m - (m [x]) 2

Let's calculate the scatter of the shots for our case:

  • M \u003d 10 2 × 0.15 + 9 2 × 0.25 + 8 2 × 0,2 + 7 2 × 0.15 + 6 2 × 0.15 + 5 2 × 0.05 + 4 2 × 0.05
  • M \u003d 62,85.
  • D [x] \u003d M - (m [x]) 2 \u003d 62.85 - (7.75) 2 \u003d 2.78

So, our deviation is 2.78. This means that from the area on the target with the value of 7.75, bullet holes are scattered by 2.78 points. However, in its pure form, the dispersion value is not used - as a result we get the square of the value, in our example it is a square point, and in other cases it can be square kilograms or square dollars. Dispersion as a square value is not informative, so it is an intermediate indicator for determining the standard deviation - the hero of our article.

Radial deviation

To convert the dispersion into logical points, kilograms or dollars is used by the RMS deviation, which is a square root from the dispersion. Let's figure it out for our example:

S \u003d SQRT (D) \u003d SQRT (2.78) \u003d 1,667

We got points and now we can use them for a bundle with mathematically waiting. The most likely shot of a shot in this case will be expressed as 7.75 plus-minus 1.667. This is enough to answer, but we can also say that almost certainly the arrows will fall into the target area between 6.08 and 9.41.

Standard deviation or sigma is an informative indicator illustrating the scatter of the magnitude relative to its center. The more Sigma, the greater the scatter demonstrates the sample. This is a well-studied coefficient for normal distribution An entertaining rule of three sigm is known. It has been established that 99.7% of the values \u200b\u200bare normal distributed value Lying in the area of \u200b\u200bplus-minus three sigm from the average arithmetic.

Consider on the example

Volatility of a currency pair

It is known that methods of mathematical statistics are widely used in the foreign exchange market. Many trading terminals have built-in tools for calculating an asset volatility, which demonstrates the measure of the variability of the price of the currency pair. Of course, financial markets have their own specifics of calculating volatility as the price of opening and closing stock exchanges, but as an example we can calculate the sigma for the last seven day candles and roughly estimate weekly volatility.

The most volatile asset of the Forex market is considered to be a currency pair of pound / yen. Let theoretically, during the week, the closing price of the Tokyo Exchange took the following values:

145, 147, 146, 150, 152, 149, 148.

We introduce this data into the calculator and we calculate the sigma, equal to 2.23. This means that on average, the Japanese yen rate changed by 2.23 yen daily. If everything was so wonderful, traders would have earned millions in such movements.

Conclusion

Standard deviation is used in statistical analysis numerical samples. This is a useful coefficient that allows you to estimate the variation of the data, since two sets C seem to be the same middle value can be completely different in scattering values. Use our calculator to search for SIGM small samples.

Determined as a generalizing characteristic of the size of the variation of the feature in the aggregate. It is equally square root from the mid-square of deviations of the individual values \u200b\u200bof the feature from the middle arithmetic, i.e. The root of and can be found like this:

1. For the primary row:

2. For variational series:

The transformation of the formula of the middle quadratic deviation leads it to the form, more convenient for practical calculations:

Average quadratic deviation Determines how the average, the specific options are deflected on their average value, and besides, it is an absolute measure of the transmission of the feature and is expressed in the same units as the options, and therefore well interpreted.

Examples of finding an average quadratic deviation: ,

For alternative features, the mid-quadratic deviation formula looks like this:

where p is the proportion of units in the aggregate with a certain feature;

q is the proportion of units that do not have this feature.

The concept of medium linear deviation

Medium linear deviation defined as the average arithmetic absolute values deviations of individual options from.

1. For the primary row:

2. For variational series:

where the sum N is variation frequency.

An example of finding a valid linear deviation:

The advantage of the average absolute deviation as a measure of dispersion before the variation is obvious, since this measure is based on accounting for all possible deviations. But this indicator has significant drawbacks. Arbitrary discarding algebraic signs of deviations may result in the mathematical properties of this indicator are far from elementary. This greatly makes it difficult to use the average absolute deviation when solving problems associated with probabilistic calculations.

Therefore, the average linear deviation as a measure of the characterization of the feature is rarely applied in statistical practice, namely, when the summation of indicators without registering signs has an economic meaning. With it, for example, the turnover of foreign trade is analyzed, the composition of working, rhythm of production, etc.

Average quadratic

Average quadratic applied, for example, to calculate the average size of the part of n square sections, the average diameters of the stems, pipes, etc. it is divided into two types.

The average quadratic simple. If, when replacing individual values \u200b\u200bof the attribute for an average value, it is necessary to maintain the constant sum of the squares of the initial values, then the average will be a quadratic medium.

She happens to be square root Of the partial values \u200b\u200bof the sign of individual values \u200b\u200bfor their number of squares,

The average quadratic weighted is calculated by the formula:

where F is a sign of weight.

Medium cubic

Medium cubic applied, for example, when determining middle length Parties and cubes. It is divided into two types.
Medium cubic simple:

When calculating average values \u200b\u200band dispersion in the interval rows of distribution, the true values \u200b\u200bof the attribute are replaced by central intervals, which are different from the middle arithmetic values \u200b\u200bincluded in the interval. This leads to systematic error When calculating the dispersion. V.F. Sheppard determined that error in the calculation of dispersioncaused by the use of grouped data is 1/12 square of the size of the interval both towards increasing and in the direction of lowering the variance of the dispersion.

Amendment Sheppard It should be used if the distribution is close to normal, refers to a sign with the continuous nature of the variation, is constructed by a significant number of source data (N\u003e 500). However, based on the fact that in some cases both errors, acting in different directions compensate each other, can sometimes abandon the introduction of amendments.

The smaller the difference between the dispersion and the average quadratic deviation, the more unity the totality and the more typical will be the average value.
In practice, statistics often arises the need to compare variations of various signs. For example, great interest is a comparison of the variations of the age of workers and their qualifications, work experience and size wages, cost and profit, work experience and labor productivity, etc. For such comparisons, the indicators of absolute sections of signs are unsuitable: one cannot compare the amounts of work experience expressed in years, with wage variation expressed in rubles.

To carry out such comparisons, as well as comparisons of the variance of the same feature in several sets with different average arithmetic, the relative variation rate is used - the variation coefficient.

Structural middle

To characterize the central trend in statistical distributions It is not rarely rational together with an average arithmetic use a certain meaning of a sign X, which, due to certain features of the location, in a number of distribution can characterize its level.

This is especially important when, in a number of distribution, the extreme signs of the sign have fuzzy boundaries. Concerning precise definition Middle arithmetic, as a rule, is not possible, or very difficult. In such cases, the average level can be determined by taking, for example, the value of the attribute that is located in the middle of a series of frequencies or which is most often found in the current row.

Such values \u200b\u200bdepend only on the nature of the frequency, i.e. from the distribution structure. They are typical at the location in a row of frequencies, so such values \u200b\u200bare considered as the characteristics of the distribution center and therefore obtained the definition of structural averages. They are used to study the internal structure and structure of the row of the distribution of the sign values. These indicators include.