Dispersion is characterized. Absolute variation indicators

This page describes a standard example of a dispersion, you can also view other tasks for finding it.

Example 1. Definition of group, medium from group, intergroup and general dispersion

Example 2. Finding the dispersion and coefficient of variation in the grouping table

Example 3. Finding the dispersion in discrete row

Example 4. There are the following data on a group of 20 students of the correspondence department. It is necessary to build an interval row of the feature distribution, calculate the average characteristic value and explore its dispersion

Build interval grouping. We define the scope of the interval by the formula:

where x max- maximum value grouping feature;
X min-minimum value of a grouping feature;
n - number of intervals:

Take N \u003d 5. Step is: H \u003d (192 - 159) / 5 \u003d 6.6

Make an interval group

For further calculations, we build a subsidiary table:

X "I- mid-interval. (For example, the middle of the interval 159 - 165.6 \u003d 162.3)

The average growth magnitude of students will determine the formula of the average arithmetic weighted:

Determine the dispersion by the formula:

The formula can be converted so:

From this formula it follows that dispersion is equal The difference between the squares from the squares of options and square and medium.

Dispersion B. variational rows from equal intervals By the method of moments, it can be calculated in the following method using the second properties of the dispersion (separating all the variants by the size of the interval). Decision definitioncalculated by the method of moments, according to the following formula less time consuming:

where i is the magnitude of the interval;
A - conditional zero, which is convenient to use the middle of the interval possessing the greatest frequency;
M1 - the square of the first order;
M2 - the moment of the second order

Dispersion of an alternative feature (If in a statistical set, the sign changes in such a way that there are only two mutually exclusive options options, then such a variability is called alternative) can be calculated by the formula:

Substituting B. this formula Dispersion q \u003d 1- p, we get:

Types of dispersion

Total dispersion Measures the variation of the feature along the entire totality as a whole under the influence of all factors that determine this variation. It is equal to the average square of deviations of individual values \u200b\u200bof the feature of X from the total mean value of x and can be defined as a simple dispersion or suspended dispersion.

Undergroup dispersion characterizes random variation. Part of the variation, which is due to the influence of unaccounted factors and independent of the sign-factor laid in the base of the grouping. Such a dispersion is equal to the average square of deviations of individual values \u200b\u200bof the feature inside the group x from the middle arithmetic group and can be calculated as a simple dispersion or as a weighted dispersion.



In this way, undergroup dispersion measures Variation of a feature inside of the group and is determined by the formula:

where xi is a group average;
Ni - the number of units in the group.

For example, intragroup dispersions that need to be determined in the task of studying the impact of workers' qualifications for labor productivity in the workshop show variations of production in each group caused by all possible factors ( technical condition Equipment, security tools and materials, workers' age, labor intensity, etc.), except the differences in the qualifying discharge (inside the group, all workers have the same qualifications).

Types of dispersions:

Total dispersion It characterizes the variation of the characteristic of the entire totality under the influence of all the factors that caused this variation. This value is determined by the formula

where is the total average arithmetic total totality under study.

Medium intragroup dispersion It indicates a random variation that may occur under the influence of any unaccounted factors and which does not depend on the sign-factor based on the grouping. This dispersion is calculated as follows: First, dispersions are calculated according to individual groups (), then the average intragroup dispersion is calculated:

where N i is the number of units in the group

Intergroup dispersion (Dispersion of group average) characterizes systematic variation, i.e. Differences in the value of the studied feature arising under the influence of a trait factor, which is based on the grouping.

where is the average size of a separate group.

All three types of dispersion are related to each other: total dispersion equal to the sum of the average intragroup dispersion and intergroup dispersion:

Properties:

25 Relative Variation Indicators

Oscill coefficient

Relative linear deviation

The coefficient of variation

Coef. OSTS aboutthe relative sections of the extreme signs around the middle. Rel. Lin. from. characterizes the share of averaged value of the sign of absolute deviations from middle size. Coef. Variations is the most common indicator of the oscillating used to assess the typical of average values.

In the statistics of the aggregate, having a coefficient of variations, more than 30-35% is considered non-uniform.

    The pattern of row of distribution. Moments of distribution. Distribution form indicators

In the variational rows there is a link between frequencies and values \u200b\u200bof the variating feature: with an increase in the sign, the frequency value first increases to a certain boundary, and then decreases. Such changes are called regularities of distribution.

The distribution form is studied using asymmetry and excesses. When calculating these indicators, use the distribution moments.

The moment of k-th order is called the average of k-x degrees of deviations of the options for the signs from a certain constant value. The order order is determined by the value of K. When analyzing the variationaries, limited by the calculation of the moments of the first four orders. When calculating moments, frequencies or frequencies can be used as weights. Depending on the choice of constant values, the initial, conditional and central moments differ.

Distribution form indicators:

Asymmetry(AS) Indicator characterizing the degree of asymmetric distribution .

Consequently, with (left-sided) negative asymmetry . With (right-hand) positive asymmetry .

To calculate asymmetry, you can use central moments. Then:

,

where μ. 3 - Central moment of third order.

- excession (E. to ) characterizes the steepness of the graph of the function in comparison with the normal distribution with the same variation:

,

where μ 4 is the central moment of 4th order.

    Law normal distribution

For normal distribution (Gauss distribution), the distribution function has the following form:

Matureness - standard deviation

The normal distribution is symmetrically and for it is characterized by the following ratio: XSR \u003d ME \u003d MO

The excess of the normal distribution is 3, and the asymmetry coefficient is 0.

The curve of the normal distribution is a polygon (symmetric bell straight)

    Types of dispersions. The rule of addition of dispersions. Essence of the empirical coefficient of determination.

If the initial set is divided into groups according to some significant feature, then calculate the following types of dispersions:

    Overall dispersion of the original aggregate:

where - the total average value of the initial set; F-frequencies of the original totality. The overall dispersion characterizes the deviation of the individual values \u200b\u200bof the feature from the total average value of the original set.

    Intra group dispersions:

where J is a group number; - the average value in each group; - frequency of the group. Urban dispersions characterize the deviation of the individual value of the attribute in each group from the group average. Of all intragroup dispersions, the average by the formula is calculated:, where the number of units in each group.

    Intergroup dispersion:

Intergroup dispersion characterizes the deviation of group average values \u200b\u200bfrom the total average value of the original set.

Rule of addition of dispersionsit is that the overall dispersion of the original set should be equal to the sum of the intergroup and medium of intragroup dispersions:

Empirical determination coefficientit shows the share of the variation of the studied attribute due to the variation of the grouping feature, and is calculated by the formula:

    The reference method from the conditional zero (method of moments) to calculate the average size and dispersion

The dispersion calculation of the method method is based on the use of formula and 3 and 4 of the properties of the dispersion.

(3. If all the values \u200b\u200bof the feature (options) increase (decrease) on some constant number A, then the dispersion of the new set will not change.

4. If all the values \u200b\u200bof the feature (options) increase (multiply) at times, where K is a constant number, then the dispersion of the new set will increase (decrease) in to 2 times.)

We obtain the formula for calculating the dispersion in variation rows at equal intervals of the method of moments:

A- conditional zero equal to the variant with the maximum frequency (mid-interval with a maximum frequency)

The calculation of the average value of the method method is also based on the use of the properties of the average.

    The concept of selective observation. Stages of research of economic phenomena by selective method

Samples are called observation in which all units of the original set are exposed to the examination and study, but only part of the units, while the result of the survey of the part of the aggregate applies to the entire initial set. A combination from which the selection of units is selected for further surveys and the study is called generaland all the indicators characterizing this combination are called general.

Possible limits of deviations of the selective average value from the General Middle Quality are called error sampling.

The combination of selected units is called Selectiveand all the indicators characterizing this combination are called selective.

Selective study includes the following steps:

Characteristics of the object of the study (mass economic phenomena). If the general population is small, then the sample is not recommended, a continuous study is necessary;

Calculation of sampling. It is important to determine the optimal volume that will allow at least the cost to obtain a sampling error within the limits permissible;

Conducting the selection of observation units, taking into account the requirements of chance, proportionality.

Proof of representativeness based on the evaluation of the sampling error. For a random sample, the error is calculated using the formula. For the target sample, representativeness is estimated using high-quality methods (comparisons, experiment);

Analysis of the selective aggregate. If the generated sample meets the requirements of representativeness, its analysis is carried out using analytical indicators (medium, relative, etc.).

Theory of probability is a special section of mathematics, which is learn only by students of higher educational institutions. Do you like calculations and formulas? You are not frightened by the prospects for acquaintance with the normal distribution, entropy of the ensemble, the expectation and dispersion of the discrete random variable? Then this subject will be very interesting. Let's get acquainted with several essential basic concepts of this section of science.

Recall the basics

Even if you remember the simplest concepts of the theory of probability, do not neglect the first paragraphs of the article. The fact is that without a clear understanding of the basics you will not be able to work with the formulas considered below.

So, there is some random event, a certain experiment. As a result of the actions, we can get several outcomes - some of them are more common, others - less often. The probability of an event is the ratio of the number of actually obtained outcomes of the same type to the total number of possible. Only knowing the classic definition of this concept, you can start learning mathematical expectation and dispersion of continuous random variables.

Average

Still at school in the lessons of mathematics, you started working with an average arithmetic. This concept is widely used in the theory of probability, and therefore it is impossible to bypass the side. The main thing for us on this moment It is that we will face it in the formulas of the mathematical expectation and dispersion of a random variable.

We have a sequence of numbers and want to find the arithmetic average. All that is required from us is to sum up everything available and divided by the number of elements in the sequence. Let we have numbers from 1 to 9. The amount of the elements will be equal to 45, and this value we divide by 9. Answer: - 5.

Dispersion

Speaking scientificDispersion is the average square of deviations of the obtained sign values \u200b\u200bfrom the average arithmetic. It is indicated by one title Latin letter D. What do you need to calculate it? For each element of the sequence, we calculate the difference between the existing number and the average arithmetic and erected into the square. The values \u200b\u200bwill turn out exactly as much as the events considered by us can be. Next, we summarize everything obtained and divided by the number of elements in the sequence. If we have five outcomes, we divide five.

The dispersion has the properties that need to be remembered to apply when solving tasks. For example, with an increase in random variable in X times, the dispersion increases into x in the square times (i.e. x * x). It never happens less than zero and does not depend on the shift of values \u200b\u200bto an equal value in a large or smaller side. In addition, for independent tests, the amount dispersion is equal to the amount of dispersions.

Now we need to consider examples of dispersion of discrete random variance and mathematical expectation.

Suppose we spent 21 experiments and received 7 different outcomes. Each of them we observed, respectively, 1,2,2,3,4,4 and 5 times. What will the dispersion be equal to?

First, consider the arithmetic average: the sum of the elements, of course, is equal to 21. We divide it to 7, yielding 3. Now from each number of the initial sequence will be subtracted 3, each value is erected into a square, and the results will add together together. It turns out 12. Now we have to divide the number on the number of elements, and it would seem, everything. But there is a snag! Let's discuss it.

Dependence on the number of experiments

It turns out that when calculating the dispersion in the denominator may be one of two numbers: either N or N-1. Here n is the number of experiments or the number of elements in the sequence (which is essentially the same). What does it depend on?

If the number of tests is measured by hundreds, then we must put in the N. denominator if units, then n-1. The border scientists decided to hold quite symbolically: today it passes according to the figure 30. If we spent less than 30 experiments, we will divide the amount on N-1, and if more - then on N.

A task

Let's go back to our example solving the problem of dispersion and mathematical expectation. We obtained an intermediate number 12, which it was necessary to divide on n or n-1. Since experiments we conducted 21, which is less than 30, choose the second option. So, the answer: the dispersion is 12/2 \u003d 2.

Expected value

Let us turn to the second concept that we must consider this article. The mathematical expectation is the result of the addition of all possible outcomes, multiplied by the corresponding probabilities. It is important to understand that the value obtained, as well as the result of the dispersion calculation, it turns out only once for a whole task, no matter how outcomes it is not considered.

The formula of the mathematical expectation is quite simple: we take the outcome, multiply on its probability, we add the same for the second, third result, etc. Everything related to this concept is calculated easy. For example, the amount of matchmakers is equal to the sum of the amount. For the work is relevant the same. Such simple operations makes it possible to carry out far from each value in the theory of probability. Let's take the task and consider the importance of the concepts we studied at once. In addition, we were distracted by theory - it's time to practice.

One more example

We spent 50 tests and received 10 types of outcomes - numbers from 0 to 9 - appearing in various percentage. This, respectively: 2%, 10%, 4%, 14%, 2%, 18%, 6%, 16%, 10%, 18%. Recall that in order to obtain probabilities, it is necessary to divide the values \u200b\u200bin percent per 100. Thus, we obtain 0.02; 0.1, etc. Imagine for dispersion of random variance and mathematical expectation example of a solution to the problem.

The arithmetic average is calculated by the formula that I remember from the younger School: 50/10 \u003d 5.

Now we will transfer the probability to the number of outcomes "in pieces" so that it is more convenient to count. We obtain 1, 5, 2, 7, 1, 9, 3, 8, 5, 7, 1, 1, 9, 3, 8, 5, and 9. From each obtained value, the average arithmetic is subtracted, after which each of the results obtained erected into the square. Look how to do this, on the example of the first element: 1 - 5 \u003d (-4). Next: (-4) * (-4) \u003d 16. For the remaining values, do these operations yourself. If you did everything right, then after addition you get 90.

Continue calculating the dispersion and mathematical expectation, dividing 90 on N. Why do we choose n, and not n-1? That's right, because the number of experiments performed exceeds 30. So: 90/10 \u003d 9. The dispersion we received. If you have another number, do not despair. Most likely, you made a banal error when calculating. Check written, and surely everything will fall into place.

Finally, remember the formula of the expectation. We will not give all the calculations, write only the answer you can handle, by completing all the required procedures. Materialization will be equal to 5.48. Recall only how to carry out operations, on the example of the first elements: 0 * 0.02 + 1 * 0,1 ... and so on. As you can see, we simply multiply the value of the outcome of its probability.

Deviation

Another concept, closely associated with dispersion and mathematical expectation - the average quadratic deviation. It is indicated by either Latin SD letters, or a Greek lowercase "sigma". This concept shows how average the values \u200b\u200bfrom the central sign are deflected. To find its meaning, you need to calculate square root From dispersion.

If you build a chart of normal distribution and want to see directly on it quadratic deviationThis can be done in several stages. Take half the image to the left or right of the mode (central value), carry out perpendicular to the horizontal axis so that the area of \u200b\u200bthe figures have been equal. The size of the segment between the middle of the distribution and the resulting projection on the horizontal axis will be a secondary quadratic deviation.

Software

As can be seen from the descriptions of the formulas and the examples presented, the calculations of the dispersion and mathematical expectation are not the simplest procedure from an arithmetic point of view. In order not to spend time, it makes sense to use the program used in the highest educational institutions - It is called "R". It has functions that allow you to calculate the values \u200b\u200bfor many concepts from statistics and probability theory.

For example, you specify the vector values. This is done as follows: vector<-c(1,5,2…). Теперь, когда вам потребуется посчитать какие-либо значения для этого вектора, вы пишете функцию и задаете его в качестве аргумента. Для нахождения дисперсии вам нужно будет использовать функцию var. Пример её использования: var(vector). Далее вы просто нажимаете «ввод» и получаете результат.

Finally

Dispersion and mathematical expectation are without which it is difficult to calculate anything else. In the main year of lectures in universities, they are already considered in the first months of study of the subject. It is because of the misunderstanding of these simplest concepts and inability to calculate them, many students immediately begin to lag behind the program and later get bad marks based on the results of the session, which deprives them of scholarships.

Practice at least one week half an hour per day, solving tasks similar to those presented in this article. Then, on any control on probability theory, you will handle examples without foreign tips and crib.

Along with the study of the variation of the feature throughout the entire totality, as a whole, it is often necessary to trace quantitative changes in the signs of groups on which the totality is divided, as well as between groups. Such a study of the variation is achieved by calculating and analyzing various types of dispersion.
Allocate dispersion common, intergroup and intragroup.
Total dispersion σ 2 Measures the variation of the feature along the entire totality under the influence of all factors that caused this variation.

Intergroup dispersion (δ) characterizes a systematic variation, i.e. Differences in the magnitude of the studied feature arising under the influence of a factor laid in the base of the grouping. It is calculated by the formula:
.

Internal dispersion (σ) reflects a random variation, i.e. Part of the variation occurring under the influence of unaccounted factors and independent of the factor laid in the base of the grouping. It is calculated by the formula:
.

Medium of intragroup dispersions: .

There is a law connecting 3 types of dispersion. The total dispersion is equal to the sum of the middle of the intragroup and intergroup dispersion: .
This ratio is called rule of addition of dispersions.

An indicator is widely used in the analysis, which is a fraction of an intergroup dispersion in a common dispersion. He is called empirical determination coefficient (η 2): .
Square root from the empirical determination coefficient is called empirical correlation relationship (η):
.
It characterizes the effect of a feature laid in the base of the grouping, on the variation of an effective feature. The empirical correlation rate varies from 0 to 1.
We show its practical use on the following example (Table 1).

Example number 1. Table 1 - labor productivity of two groups of workers of one of the workshops of NPO "Cyclone"

Calculate general and group average and dispersion:




The initial data for calculating the middle of the intragroup and intergroup dispersion are presented in Table. 2.
table 2
Calculation and Δ 2 in two groups of workers.


Groups workers
The number of workers, people. Middle, children / shift. Dispersion

Past technical training

5 95 42,0

Not underway technical training

5 81 231,2

All workers

10 88 185,6
Calculate the indicators. Average of intragroup dispersions:
.
Intergroup dispersion

Total dispersion:
Thus, the empirical correlation ratio :.

Along with the variation of quantitative signs, the variation of high-quality signs can also be observed. Such a study of the variation is achieved by calculating the following types of dispersions:

Undergroup dispersion of the share is determined by the formula

Where n I. - The number of units in individual groups.
The share of the studied feature in the entire population, which is determined by the formula:
Three types of dispersion are related to each other:
.

This ratio of dispersions is called the theorem of the addition of dispersions of the trait share.

Dispersion is a scattering measure that describes a comparative deviation between data values \u200b\u200band medium size. It is the most used scattering measure in statistics, calculated by summing, erected into a square, deviating each data value from the average value. The formula for calculating the dispersion is presented below:

s 2 - sample dispersion;

x Wed - average sample value;

n. Sample size (number of data values),

(x i - x cf) - deviation from the average value for each dataset value.

For a better understanding of the formula, we will analyze an example. I do not really love cooking, so the occupation of this is extremely rare. Nevertheless, in order not to die with hunger, from time to time I have to approach the stove to implement the plan on saturation of my organism with proteins, fats and carbohydrates. The data set retood below shows how many times the Renat prepares food every month:

The first step in calculating the dispersion is the definition of the average sampling value, which in our example is 7.8 times a month. The remaining calculations can be facilitated using the following table.

The final phase of the dispersion calculation looks like this:

For those who like to produce all calculations at a time, the equation will look like this:

Using the "Raw Account" method (Example with cooking)

There is a more efficient way to calculate the dispersion known as the "raw account" method. Although at first glance, the equation may seem very cumbersome, in fact it is not terrible. You can make sure this, and then decide which method you like more.

- the sum of each data value after the construction of the square,

- Square amount of all data values.

Do not lose the mind right now. Let me present all this in the form of a table, and then you will see that the calculations here are less than in the previous example.

As you can see, the result turned out the same as when using the previous method. The advantages of this method become apparent as the sample size (N) grows.

Excel dispersion calculation

As you already, probably guessed, there is a formula in Excel, which allows you to calculate the dispersion. Moreover, since Excel 2010, you can find 4 varieties of the dispersion formula:

1) Display - returns a sample dispersion. Logical values \u200b\u200band text are ignored.

2) Display - returns dispersion by the general population. Logical values \u200b\u200band text are ignored.

3) Dispa - returns a sample dispersion based on logical and text values.

4) Display - returns dispersion by the general population, taking into account logical and text values.

To begin with, we will understand the difference between the sample and the general population. The purpose of the descriptive statistics is to summarize or display the data so as to quickly receive the overall picture, so to speak, review. Statistical output allows you to make conclusions about any combination based on sampling of data from this totality. The aggregate is all possible outcomes or measurements representing interest to us. The sample is a subset of the aggregate.

For example, we are interested in a totality of a group of students of one of the Russian universities and we need to determine the middle ball of the group. We can calculate the average student performance, and then the resulting digit will be the parameter, since a whole set will be involved in our calculations. However, if we want to calculate the average ball of all students of our country, then this group will be our sample.

The difference in the formula for calculating the dispersion between the sample and the set is the denominator. Where it will be equal to the sample (N-1), and for the general population of only N.

Now we will deal with the functions of calculating the dispersion with the endings BUT,in the description of which it is said that the calculation takes into account text and logical values. In this case, when calculating the dispersion of a specific data array, where there are no numeric values, Excel will interpret text and false logical values \u200b\u200bas equal to 0, and the true logical values \u200b\u200bare equal to 1.

So, if you have an array of data, it is difficult to calculate its dispersion by using one of the above Excel functions.