Checking the hypothesis on the normal distribution of the general aggregate by the criterion of Pearson. Criterion for the consent of Pearson (Chi-Square Criterion)

Consider the application B.MS.ExcelcRITIRIY CHI-SQADAT PERSONS for checking simple hypotheses.

After receiving experimental data (i.e., when there is some sample) It is usually the choice of the distribution law that is most well described by the random amount presented by this sample. Checking how well experimental data is described by the selected theoretical distribution law, is carried out using criteria of consent. Zero hypothesis, usually advocates a hypothesis about the equality of distribution random variable Some theoretical law.

First consider the application criterion for the consent of Pearson X 2 (chi-square) With regard to simple hypotheses (the parameters of theoretical distribution are considered known). Then - when only the distribution form is set, and the parameters of this distribution and value statistics X 2 evaluated / calculated on the basis of the same samples.

Note: In English literature Application procedure criterion consent Pearson X 2 its name THE CHI-SQUARE GOODNESS OF FIT TEST.

Recall the procedure for checking hypotheses:

  • based samples The value is calculated statisticswhich corresponds to the type of hypothesis checked. For example, for used t.-statistics (if not known);
  • subject to truth zero hypothesis, Distribution of this statistics It can be used and can be used to calculate probabilities (for example, for t.- statisticsthis is );
  • calculated based on samples value statistics Compared with a critical value for a given value ();
  • zero hypothesis reject if the value statistics more critical (or if the probability of getting this value statistics () less level of significanceWhat is an equivalent approach).

Let's spend checking hypothesis For various distributions.

Discrete case

Suppose that two people play bones. Each player has its own set of bones. Players in turn throw at once 3 cubes. Each round wins the one who will throw out the more six times. Results are written. One of the players after 100 rounds had a suspicion that the bones of his opponent - asymmetrical, because He often wins (often throws six). He decided to analyze how likely such is the number of opponent's outcomes.

Note: Because cubes 3, then at a time you can throw 0; one; 2 or 3 six, i.e. Random value can take 4 values.

From the theory of probability, we know that if the cubes are symmetrical, then the probability of the fence of the six is \u200b\u200bobeying. Therefore, after 100 rounds of felling frequency of six, can be calculated using the formula
\u003d Binomasp (A7; 3; 1/6; false) * 100

In the formula it is assumed that in the cell A7. There is an appropriate number of felling six in one round.

Note: Calculations are given in example File on Discrete Sheet.

For comparison observed (Observed) and theoretical frequencies (Expected) It is convenient to use.

With a significant deviation of the observed frequencies from the theoretical distribution, zero hypothesis On the distribution of a random variable on theoretical law should be rejected. Those, if the playing bones of the opponent are asymmetrical, then the observed frequencies will "differ significantly" from binomial distribution.

In our case, at first glance, the frequency is quite close and without computing it is difficult to make a unambiguous conclusion. Apply criterion for the consent of Pearson x 2so that instead of the subjective statement "significantly different", which can be done on the basis of comparison histogramsUse mathematically correct statement.

We use the fact that by virtue law large numbers Observed frequency (observed) with increasing volume samples n strives for the likelihood corresponding to theoretical law (in our case, binomial law). In our case, the sample volume n is 100.

We introduce test statisticswhich will indicate x 2:

where O L is the observed frequency of events that the random value adopted certain valid values, E L is the corresponding theoretical frequency (expected). L is the number of values \u200b\u200bthat may receive a random value (in our case equal to 4).

As can be seen from the formula, this statistics is a measure of the proximity of the observed frequencies to theoretical, i.e. With the help of it, you can estimate the "distances" between these frequencies. If the sum of these "distances" is "too large", then these frequencies "are significantly different." It is clear that if our cube is symmetrical (i.e. applicable binomial Law), the probability that the amount of "distances" will be "too large" will be small. To calculate this probability, we need to know the distribution statistics X 2 ( statistics X 2 calculated based on random samples, so it is a random value and, therefore, has its own probability distribution).

From a multidimensional analogue moorea Laplace integral theorem It is known that at n-\u003e ∞, our random value x 2 asymptotically with L is 1 degrees of freedom.

So, if the calculated value statistics X 2 (the amount of "distances" between frequencies) will be more than some limit value, then we will reject the basis zero hypothesis. As when checking parametric hypotheses, the limit value is set through significance level. If the likelihood is that statistics x 2 will take a value less or equal to the calculated ( p.-value) will be less level of significanceT. zero hypothesis You can reject.

In our case, the statistics value is 22.757. The probability that statistics X 2 will take a value greater than or equal to 22.757 very small (0.000045) and can be calculated by formulas
\u003d Hey2.Rasp.ph (22,757; 4-1) or
\u003d Hee2.Teest (Observed; Expected)

Note: Function Hee2.Test () is specifically created to check the connection between two categorical variables (see).

The probability of 0.000045 is significantly less than the usual level of significance 0.05. So, the player has every reason to suspect his opponent in dishonesty ( zero hypothesis About his honesty is rejected).

When applied criteria X 2. It is necessary to ensure that the volume samples n was large enough, otherwise the approximation of the distribution will be unusted statistics x 2.. It is usually considered that this is enough for observed frequencies (observed) to be greater than 5. If this is not so, then small frequencies are combined into one or attach to other frequencies, and the combined value is attributed to the total probability and, accordingly, the number of degrees of freedom is reduced. X 2-distribution.

In order to improve the quality of application criteria X 2. () It is necessary to reduce the range intervals (increase L and, respectively, increase the number degrees of freedom) However, this is hampered by the limit on the number of observations in each interval (D.B.\u003e 5).

Continuous case

Criteria for the consent of Pearson X 2 you can also apply in the case.

Consider some sampleconsisting of 200 values. Zero hypothesis States that sample made from .

Note: Vlound values \u200b\u200bin example File on Continuous Sheet generated using formula \u003d NORM.SHOB (adhesive ()). Therefore, new values samples Generated with each passing sheet.

Does the available data set, you can visually appreciate.

As can be seen from the diagram, the sample values \u200b\u200bare quite well stacked along the straight line. However, as in for tests of hypothesis Apply Criterion for the consent of Pearson X 2.

To do this, we break the range of changes in the random variable to the intervals in increments of 0.5. Calculate observed and theoretical frequencies. The observed frequencies are calculated using the frequency () function (), and theoretical - using the function of norms.st.Sp ().

Note: As for discrete caseneed to follow to sample It was large enough, and the interval fell\u003e 5 values.

Calculate the statistics of X 2 and compare it with a critical value for the specified level of significance(0.05). Because We broke the range of changing random variance by 10 intervals, then the number of degrees of freedom is 9. Critical value can be calculated by the formula
\u003d HAY2.OB.PH (0.05; 9) or
\u003d HAY2.OB (1-0.05; 9)

The diagram shows that the statistics value is 8.19, which is significantly higher critical valuezero hypothesis Does not reject.

Below is shown on which sample accepted unlikely meaning and on the basis of criteria pearson X 2zero hypothesis was rejected (despite the fact that random values \u200b\u200bwere generated using the formula \u003d NORM.SHOB (adhesive ())providing sample of standard normal distribution).

Zero hypothesis Rejected, although visual data are located quite close to a straight line.

As an example, we will also take sample From u (-3; 3). In this case, even from the schedule it is obvious that zero hypothesis must be rejected.

Criterion pearson X 2also confirms that zero hypothesis must be rejected.

The criterion of consent to test the hypothesis on the distribution law of the investigated random variable. In many practical tasks, the exact law of distribution is unknown. Therefore, a hypothesis on the conformity of the existing empirical law, built on observations, some theoretical hypothesis requires a statistical verification, which will be either confirmed, Or refuted.

Let X be the studied random value. It is required to check the H 0 hypothesis that this random value is subject to the law of distribution F (x). To do this, it is necessary to make a sample from N independent observations and to construct an empirical law of distribution F "(x). For comparison of empirical and hypothetical laws, a rule called the criterion of consent is used. One of the most popular is the criterion for the consent of Chi-Square K. Pearson.

It calculates the Statistics Chi-Square:

,

where n is the number of intervals by which the empirical distribution law was built (the number of columns of the corresponding histogram), I is the number of the interval, PTI is the likelihood of increasing the values \u200b\u200bof the random value in the i-th interval for the theoretical law of the distribution, PEI is the likelihood of random variance in I - I interval for the empirical distribution law. It should obey the distribution of chi-square.

If the calculated statistics value exceeds the quantil of the chi-square distribution with KP-1 degrees of freedom for a given level of significance, then the H 0 hypothesis is rejected. In the other case, it is taken at a given level of significance. Such a number K is the number of observations, P is the number of estimated parameters of the distribution law .

Pearson allows you to check the empirical and theoretical (or other empirical) distributions of one trait. This criterion is used mainly in two cases:

To compare the empirical distribution of a trait with theoretical distribution (normal, indicative, uniformly, or some other law);

To compare two empirical distributions of the same feature.

The idea of \u200b\u200bthe method is determining the degree of divergence of the corresponding frequencies N i and; the more this discrepancy, the mORE VALUE

Samples should be at least 50 and need equality amounts of frequencies

Zero hypothesis H 0 \u003d (two distributions practically do not differ among themselves); Alternative hypothesis - H 1 \u003d (the discrepancy between distributions is essential).

Let us give a criterion application scheme to compare two empirical distributions:

The criterion is a statistical criterion for testing the hypothesis that the observed random value obeys a certain theoretical distribution law.


Depending on the value of the criterion, the hypothesis can be accepted, or to reject:

§ The hypothesis is performed.

§ (enters the left "tail" distribution). Consequently, theoretical and practical values \u200b\u200bare very close. If, for example, the generator of random numbers is checked, which generated N numbers from the segment and the hypothesis: the sample is distributed evenly on, then the generator cannot be called random (the random hypothesis is not performed), since The sample is distributed too evenly, but the hypothesis is performed.

§ (falls into the right "distribution) of the hypothesis of the hypothesis.

Definition: Let the random value of X be given.

Hypothesis: from. in. X obeys the distribution law.

To check the hypothesis, consider a sample consisting of n independent observations on S.V. X:. By sample, we construct the empirical distribution of S.V. X. Comparison of the empirical and theoretical distribution (intended in the hypothesis) is made using a specially selected function-criteria of consent. Consider the criterion of Pearson's consent (criterion):

Hypothesis: X n is generated by a function.

We divide into k insection intervals ;

Let - the number of observations in j-M interval: ;

Probability of observation in the jn interval when performing a hypothesis;

- expected number of hits in the jn interval;

Statistics: - Chi-square distribution with K-1 degree of freedom.

The criterion is mistaken on samples with low-frequency (rare) events. You can drop this problem by throwing low-frequency events, or by combining them with other events. This method is called the correction correction (Yates "Correction).

The criterion for the consent of Pearson (χ 2) is used to test the hypothesis on the compliance of the empirical distribution to the intended theoretical distribution of F (x) with a large size of the sample (N ≥ 100). The criterion is applicable to any species F (X), even with unknown values \u200b\u200bof their parameters, which usually takes place when analyzing the results mechanical tests. This is his versatility.

The use of the χ 2 criterion provides for the splitting of the variation of the sample to the intervals and determination of the number of observations (frequencies) N j for each of e. Intervals. For the convenience of estimates of the distribution parameters, the intervals are chosen the same length.

The number of intervals depends on the size of the sample. Usually accept: at n \u003d 100 e. \u003d 10 ÷ 15, at n \u003d 200 e. \u003d 15 ÷ 20, at n \u003d 400 e. \u003d 25 ÷ 30, at n \u003d 1000 e. \u003d 35 ÷ 40.

Intervals containing less than five observations are combined with neighboring. However, if the number of such intervals is less than 20% of their total quantity, intervals with a frequency N j ≥ 2 are allowed.

Statistics Criteria Pearson serves
, (3.91)
where p j is the likelihood of increasing random variance in j-and interval, calculated in accordance with the hypothetical law distribution F (X). When calculating the probability P j, it should be borne in mind that the left boundary of the first interval and the rightmost latter should coincide with the boundaries of the area of \u200b\u200bpossible values \u200b\u200bof the random variable. For example, when normal distribution The first interval extends to -∞, and the last - to + ∞.

The zero hypothesis of compliance with the selective distribution of the theoretical law F (x) is checked by comparing the values \u200b\u200bcalculated by the formula (3.91) with a critical value of χ 2 α, found in the table. VI application for the level of significance α and the number of degrees of freedom k \u003d e. 1 - M - 1. Here e. 1 - the number of intervals after the union; M - the number of parameters valued by the sample under consideration. If inequality is performed
χ 2 ≤ χ 2 α (3.92)
That zero hypothesis is not rejecting. If not compared to the specified inequality, they take an alternative hypothesis about the sample belonging to an unknown distribution.

The lack of criterion for the consent of Pearson is the loss of part of the initial information associated with the need to group the results of observations in the intervals and uniting individual intervals with a small number of observations. In addition, it is recommended to supplement the verification of the compliance of distributions by the criterion of χ 2 by other criteria. It is necessary with a relatively small volume. Samples (N ≈ 100).

The table shows the critical values \u200b\u200bof the chi-square of the distribution with a predetermined number of degrees of freedom. The first value is at the intersection of the column with the corresponding probability value and a string with the number of freedom degrees. For example, critical value Chi-square distribution with 4 degrees of freedom for probability 0.25 is 5.38527. This means that the area under the density curve chi-square distribution with 4 degrees of freedom to the right of 5.38527 is 0.25.

Pearson correlation criteria is a method of parametric statistics, which makes it possible to determine the presence or absence of a linear connection between two quantitative indicators, as well as evaluate its closeness and statistical significance. In other words, the criterion of Correlation of Pearson allows you to determine whether there is a linear connection between changes in the values \u200b\u200bof two variables. In statistical calculations and conclusions, the correlation coefficient is usually indicated as r XY. or R XY..

1. The history of the development of the correlation criterion

Pearson correlation criteria was developed by a team of British scientists led by Karl Pearson (1857-1936) in the 90s of the 19th century, to simplify the analysis of covariance of two random variables. In addition to Karl Pearson over Pearson correlation criteria also worked Francis Edzhuort and Rafael Weldon.

2. What is the Criterization Criterion Criterion?

The Pearson correlation criteria allows you to determine what is the pronent (or strength) of the correlation bond between the two indicators measured in the quantitative scale. With the help of additional calculations, it is also possible to determine how long the detected connection is significant.

For example, with the help of the Correlation criterion, it is possible to answer the question of the availability of the body's temperature and blood leukocyte content in acute respiratory infections, between the growth and weight of the patient, between the content in drinking water Fluoride and the incidence of the population by caries.

3. Conditions and restrictions on the application of the criterion chi-square Pearson

  1. Compared indicators must be measured in quantity scale (for example, cardiac frequency, body temperature, leukocyte content in 1 ml of blood, systolic blood pressure).
  2. Through the Correlation Criterization, you can only determine availability and strength of linear relationship Between values. Other features of communication, including direction (direct or reverse), the nature of the changes (straight or curvilinear), as well as the presence of a single variable dependence on the other - are determined by the help of regression analysis.
  3. The number of compared quantities should be two. In the case of the analysis of the interconnection of three or more parameters, you should use the method factor analysis.
  4. Pearson correlation criteria is parametric, in connection with the condition of its application serves normal distributioncompared variables. If necessary correlation analysis Indicators, the distribution of which differs from the normal, including those measured in the ordinal scale, should be used by the coefficient of the river correlation of the spirit.
  5. It should clearly distinguish the concepts of dependence and correlation. The dependence of the magnitudes causes the presence of a correlation between them, but not the opposite.

For example, a child's growth depends on his age, that is, what older child, that is higher. If we take two children of different ages, with a high probability, the growth of the older child will be more than that of the younger. This phenomenon and called addictionimplying a causal relationship between indicators. Of course, there is between them and correlation, meaning that changes in one indicator are accompanied by changes in another indicator.

In another situation, consider the relationship of the growth of the child and the frequency of heart abbreviations (CSS). As you know, both of these values \u200b\u200bare directly dependent on age, so in most cases children larger growth (and hence older age) will have smaller values \u200b\u200bof the heart rate. I.e, correlation will be observed and can have quite high closeness. However, if we take children the same age, but of different growth, most likely, the heart rate will differ insignificant, in connection with which it can be concluded independence Heat from growth.

The example above shows how important it is to distinguish fundamental in the statistics of the concept. communication and dependencies Indicators for the construction of faithful conclusions.

4. How to calculate the Pearson correlation coefficient?

The calculation of the Pearson correlation coefficient is made according to the following formula:

5. How to interpret the value of the Pearson correlation coefficient?

The purson correlation coefficient is interpreted based on its absolute values. Possible values \u200b\u200bof the correlation coefficient varies from 0 to ± 1. The greater the absolute value of R xy, the higher the tone of the connection between the two values. R xy \u003d 0 speaks of the complete absence of communication. R xy \u003d 1 - indicates the presence of an absolute (functional) connection. If the purson correlation criterion value turned out to be greater than 1 or less -1 - an error is allowed in the calculations.

To evaluate closer, or forces, correlations typically use generally accepted criteria according to which absolute values R XY.< 0.3 свидетельствуют о weak communication, values \u200b\u200bR xy from 0.3 to 0.7 - about communication middletightness, values \u200b\u200bR xy\u003e 0.7 - about strongcommunication.

A more accurate assessment of the correlation force can be obtained if you use chaddoka table:

Evaluation statistical significance The correlation coefficient R xy is carried out using the T-criterion calculated by the following formula:

The obtained value of T R is compared with a critical value at a certain level of significance and the number of degrees of freedom of N-2. If T R exceeds T Crete, it is concluded about the statistical significance of the identified correlation.

6. Example of calculating Pearson correlation coefficient

The purpose of the study was to identify, determining the crosses and statistical significance of the correlation between the two quantitative indicators: testosterone level in the blood (x) and percentage muscular mass in body (y). The source data for the sample consisting of 5 studied (n \u003d 5) is reduced in the table.

Pearson's criterion for testing the hypothesis about the form of a random variable distribution law. Checking the hypotheses about the normal, indicative and uniform distribution on the Purson criterion. Kolmogorov's criterion. The approximate method for checking the normality of the distribution associated with estimates of the asymmetry and excesses coefficients.

In the previous lecture, hypotheses were considered, in which the law of the distribution of the general population was assumed to be known. Now we will check the hypotheses on the estimated law of an unknown distribution, that is, we will check the zero hypothesis that the general population is distributed according to some well-known law. Usually statistical criteria To verify such hypotheses are called consent criteria.

The advantage of the Pearson criterion is its versatility: with its help you can check hypotheses about various distribution laws.

1. Checking the hypothesis about the normal distribution.

Let the sample obtained quite large p from large quantity Value values \u200b\u200boption. Due to the convenience of its processing, we divide the interval from the smallest to the most from the values \u200b\u200bof the option on s. equal parts and we assume that Var values

anT, which came to each interval, are approximately equal to the number specifying the middle of the interval. By calculating the number of the option that came to each interval will be the so-called grouped sample:

options h. 1 h. 2 x S.

frequency p 1 p 2 n ,

where x I. - the values \u200b\u200bof the middle of the intervals, and p I. - the number of the option in i.- I am interval (Empinic frequencies).

According to the data obtained, you can calculate the selective average and selective average quadratic deviation Σ B.. We verify the assumption that the general aggregate is distributed according to the normal law with the parameters M.(X.) = , D.(X.) \u003d. Then you can find the number of numbers from the sample volume pwhich should be in every inter-shaft with this assumption (that is, theoretical frequencies). To do this, on the table of the values \u200b\u200bof the Laplace function we will find the likelihood of i.- I interval:

,

where a I. and b I.- Borders I.-Ho interval. Multiplying the probabilities obtained on the size of the sample P, we find theoretical frequencies: n i \u003d n? p i. Our goal is to compare the empirical and theoretical frequencies that, of course, differ from each other, and find out whether these differences are insignificant that does not disprove the hypothesis about the normal distribution of the random value under study, or they are so great that they contradict this hypothesis. This uses a criterion in the form of a random variable.

. (20.1)

Its meaning is obvious: the parts are summed up that the squares of the deviations of the empirical frequencies on theoretical are from the corresponding theoretical frequencies. It can be proved that, regardless of the real law of the distribution of the general total, the law of the distribution of a random variable (20.1), seeks to the law of distribution (see lecture 12) with the number of degrees of freedom k \u003d s -1 - r.where R.- The number of parameters of the intended distribution, estimated according to the sample data. Normal distribution is characterized by two parameters, therefore k \u003d s - 3. For the selected criterion, a right-handed critical area is built, a condition determined by the condition


(20.2)

where α - significance level. Consequently, the critical area is given inequality And the field of adoption of the hypothesis is.

So, to check the zero hypothesis N. 0: The general population is distributed normally - it is necessary to calculate the observed value of the criterion on the sample:

, (20.1`)

and on the table of critical distribution points χ 2 to find a critical point using known values \u200b\u200bα and k \u003d s - 3. If the zero hypothesis is taken, when they are rejected.

2. Check hypothesis about uniform distribution.

When using the Pearson criterion to test the hypothesis on the uniform distribution of the general population with the intended probability density

needed by calculating the value to the existing sample, evaluate the parameters but and b. By formulas:

where but* and b * - Evaluation but and b.. Indeed, for uniform distribution M.(H.) = , where you can get the system to determine but* and b.*: The solution of which is expressions (20.3).

Then assuming that , you can find theoretical frequencies by formulas

Here s. - The number of intervals to which the sample is broken.

The observed value of the Pearson criterion is calculated by formula (20.1`), and critical - on the table, taking into account the fact that the number of degrees of freedom k \u003d s -3. After that, the boundaries of the critical area are determined in the same way as to test the hypothesis about the normal distribution.

3. Checking the hypothesis about the indicative distribution.

In this case, by breaking the existing sample on an equal length intervals, consider the sequence of an option equid to each other (we believe that all the options that have fallen in i. - interval, take a value that coincides with its middle), and the corresponding frequencies n I.(number of sampling in i. - interval). Calculate according to this data and accept as an estimate of the parameter λ magnitude. Then theoretical frequencies are calculated by the formula

Then compared the observed and critical value of the Pearson criterion is compared with the fact that the number of degrees of freedom k \u003d s - 2.

The width of the interval will be:

Xmax is the maximum value of the grouping feature in the aggregate.
Xmin is the minimum value of the grouping feature.
We define the borders of the group.

Group number Bottom line Top border
1 43 45.83
2 45.83 48.66
3 48.66 51.49
4 51.49 54.32
5 54.32 57.15
6 57.15 60

The same value of the sign serves as the upper and lower bounds of two adjacent (previous and subsequent) groups.
For each value of the row we will calculate how many times it enters a particular interval. To do this, we sort a row ascending.
43 43 - 45.83 1
48.5 45.83 - 48.66 1
49 48.66 - 51.49 1
49 48.66 - 51.49 2
49.5 48.66 - 51.49 3
50 48.66 - 51.49 4
50 48.66 - 51.49 5
50.5 48.66 - 51.49 6
51.5 51.49 - 54.32 1
51.5 51.49 - 54.32 2
52 51.49 - 54.32 3
52 51.49 - 54.32 4
52 51.49 - 54.32 5
52 51.49 - 54.32 6
52 51.49 - 54.32 7
52 51.49 - 54.32 8
52 51.49 - 54.32 9
52.5 51.49 - 54.32 10
52.5 51.49 - 54.32 11
53 51.49 - 54.32 12
53 51.49 - 54.32 13
53 51.49 - 54.32 14
53.5 51.49 - 54.32 15
54 51.49 - 54.32 16
54 51.49 - 54.32 17
54 51.49 - 54.32 18
54.5 54.32 - 57.15 1
54.5 54.32 - 57.15 2
55.5 54.32 - 57.15 3
57 54.32 - 57.15 4
57.5 57.15 - 59.98 1
57.5 57.15 - 59.98 2
58 57.15 - 59.98 3
58 57.15 - 59.98 4
58.5 57.15 - 59.98 5
60 57.15 - 59.98 6

Results of grouping issues in the form of a table:
Groups No. COLUMPLEMENT Frequency F. i.
43 - 45.83 1 1
45.83 - 48.66 2 1
48.66 - 51.49 3,4,5,6,7,8 6
51.49 - 54.32 9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26 18
54.32 - 57.15 27,28,29,30 4
57.15 - 59.98 31,32,33,34,35,36 6

Table for calculating indicators.
Groups X I. Quantity, F I X I * F I Accumulated frequency, s | x - x cp | * f (X - X CP) 2 * F Frequency, F I / N
43 - 45.83 44.42 1 44.42 1 8.88 78.91 0.0278
45.83 - 48.66 47.25 1 47.25 2 6.05 36.64 0.0278
48.66 - 51.49 50.08 6 300.45 8 19.34 62.33 0.17
51.49 - 54.32 52.91 18 952.29 26 7.07 2.78 0.5
54.32 - 57.15 55.74 4 222.94 30 9.75 23.75 0.11
57.15 - 59.98 58.57 6 351.39 36 31.6 166.44 0.17
36 1918.73 82.7 370.86 1

To estimate a number of distribution, we find the following indicators:
Indicators of the distribution center.
Medium weighted


Fashion
Fashion - the most common meaning of the sign in units of this set.

where x 0 is the beginning of the modal interval; h - the magnitude of the interval; f 2 -start, corresponding to the modal interval; F 1 - pre-correct frequency; F 3 - postal frequency.
We choose as the beginning of the interval 51.49, since it is for this interval the greatest number.

The most common value of the row is 52.8
Median
Median divides a sample into two parts: half the option is less median, half - more.
IN interval row Distributions Immediately, you can only specify the interval in which a fashion or median will be located. The median corresponds to an option standing in the middle of a ranked row. Median is interval 51.49 - 54.32, because In this interval, the accumulated frequency S, more median number (median is called the first interval, the accumulated frequency of which exceeds half the total frequency sum).


Thus, 50% of units of aggregate will be less than in size 53.06
Indicators of variation.
Absolute variation indicators.
The variation variation is the difference between the maximum and minimum values \u200b\u200bof the sign of the primary row.
R \u003d X MAX - X min
R \u003d 60 - 43 \u003d 17
Medium linear deviation - Calculate in order to take into account the differences between all units of the total totality.


Each value of a series differs from another no more than 2.3
Dispersion - characterizes the measure of scattering near its average (dispersion measure, i.e. deviations from average).


Unchanged evaluation of dispersion - a wealthy evaluation of the dispersion.


Average quadratic deviation.

Each value of the row differs from the average value of 53.3 no more than 3.21
Assessment of the standard deviation.

Relative indicators of variation.
Relative variations include: oscillation coefficient, linear coefficient Variations, relative linear deviation.
The coefficient of variation - measure of relative scattering of the sets of aggregate: shows which proportion of the average value of this value is its average variation.

Since V ≤ 30%, then the aggregate is homogeneous, and the variation is weak. The results obtained can be trusted.
Linear coefficient of variation or Relative linear deviation - characterizes the share of averaged value of the sign of absolute deviations from the average value.

Checking the hypotheses on the form of distribution.
1. Check the hypothesis that x is distributed by normal law Using the criterion for the consent of Pearson.

where p i is the probability of hitting the I-th interval of a random variable distributed by hypothetical law
To calculate the probabilities P i, we apply the formula and table of the Laplace function

where
s \u003d 3.21, x cp \u003d 53.3
Theoretical (expected) frequency is N i \u003d np i, where n \u003d 36
Intervals Grouping Observed frequency N i x 1 \u003d (x i - x cp) / s x 2 \u003d (x i + 1 - x cp) / s F (x 1) F (x 2) The probability of entering the i-th interval, p i \u003d f (x 2) - f (x 1) Expected Frequency, 36p I Factory Statistics Pearson, K I
43 - 45.83 1 -3.16 -2.29 -0.5 -0.49 0.01 0.36 1.14
45.83 - 48.66 1 -2.29 -1.42 -0.49 -0.42 0.0657 2.37 0.79
48.66 - 51.49 6 -1.42 -0.56 -0.42 -0.21 0.21 7.61 0.34
51.49 - 54.32 18 -0.56 0.31 -0.21 0.13 0.34 12.16 2.8
54.32 - 57.15 4 0.31 1.18 0.13 0.38 0.26 9.27 3
57.15 - 59.98 6 1.18 2.06 0.38 0.48 0.0973 3.5 1.78
36 9.84

Determine the border of the critical area. Since the purson statistics measure the difference between empirical and theoretical distributions, the more its observed value K Navel, the stronger the argument against the main hypothesis.
Therefore, the critical area for this statistic is always right :)