Table of critical values ​​of the student's t criterion. Distribution of Student's t-test for testing the hypothesis about the mean and calculating the confidence interval in MS Excel

Testing a statistical hypothesis allows you to make a rigorous conclusion about the characteristics of the general population based on sample data. There are different hypotheses. One of them is the hypothesis about the mean ( mathematical expectation). Its essence is to make a correct conclusion on the basis of only the available sample about where the general average may or may not be (we will never know the exact truth, but we can narrow the search circle).

The general approach to testing hypotheses is described, so straight to the point. Let's assume for a start that the sample is extracted from normal population random variables X with general average μ and variance σ 2(I know, I know it doesn't work that way, but you don't need to interrupt me!). The arithmetic mean from this sample is obviously itself a random variable. If you extract many such samples and calculate the average over them, then they will also have a mathematical expectation μ and

Then random value

The question arises: will the general average with a probability of 95% be within ± 1.96 s x̅... In other words, are the distributions of random variables

equivalent.

For the first time, this question was raised (and resolved) by a chemist who worked at the Guinness beer factory in Dublin (Ireland). The chemist's name was William Seeley Gosset and he took samples of beer for chemical analysis... At some point, apparently, William began to be tormented by vague doubts about the distribution of averages. It turned out to be a little more smeared than the normal distribution should have.

After collecting the mathematical justification and calculating the values ​​of the distribution function he discovered, a chemist from Dublin, William Gossett, wrote a note that was published in the March 1908 issue of the journal "Biometrics" (chief editor - Karl Pearson). Because Guinness strictly forbade giving out the secrets of brewing, Gosset signed himself with the pseudonym Student.

Despite the fact that K. Pearson had already invented distribution, the general idea of ​​normality still dominated. Nobody was going to think that the distribution of sample estimates might not be normal. Therefore, the article by W. Gosset remained practically unnoticed and forgotten. And only Ronald Fisher appreciated Gosset's discovery. Fischer used the new distribution in his works and gave it the name Student's t-distribution... The criterion for testing hypotheses, respectively, became Student's t-test... Thus, there was a "revolution" in statistics that stepped into the era of sample data analysis. It was short excursion into history.

Let's see what W. Gosset could see. Let's generate 20 thousand normal samples from 6 observations with an average ( ) 50 and standard deviation ( σ ) 10. Then we normalize the sample means using general variance:

We group the resulting 20 thousand averages into intervals of 0.1 length and calculate the frequencies. Let us plot the actual (Norm) and theoretical (ENorm) frequency distributions of the sample averages on the diagram.

The points (observed frequencies) practically coincide with the line (theoretical frequencies). This is understandable, because the data are taken from the same general population, and the differences are just sampling errors.

Let's do a new experiment. Normalize the averages using sample variance.

Let's count the frequencies again and plot them on the diagram as dots, leaving the standard normal distribution line for comparison. Let us denote the empirical frequencies of the averages, say, through the letter t.

It can be seen that the distributions this time do not coincide very much. Close, yes, but not the same. The tails have become heavier.

Gosset-Student did not have latest version MS Excel, but it was this effect that he noticed. Why does this happen? The explanation is that the random variable

depends not only on the sampling error (numerator), but also on the standard error of the mean (denominator), which is also a random variable.

Let's take a little look at what distribution such a random variable should have. First, you have to remember (or learn) something from mathematical statistics... There is such a theorem of Fisher, which states that in a sample from a normal distribution:

1.medium and sample variance s 2 are independent quantities;

2.the ratio of the sample and general variance, multiplied by the number of degrees of freedom, has a distribution χ 2(chi-square) with the same number of degrees of freedom, i.e.

where k- the number of degrees of freedom (in English degrees of freedom (d.f.))

Many other results in the statistics of normal models are based on this law.

Let's go back to the distribution of the mean. Divide the numerator and denominator of the expression

on the σ X̅... We get

The numerator is a standard normal random variable (denote ξ (xi)). The denominator is expressible from Fisher's theorem.

Then the original expression will take the form

This is in general view(student attitude). The function of its distribution can be derived directly, since distributions of both random variables in this expression are known. Let's leave this pleasure to the mathematicians.

The Student's t-distribution function has a rather complicated formula for understanding, so it makes no sense to analyze it. All the same, no one uses it, tk. the probabilities are given in special tables of the Student's distribution (sometimes called tables of the Student's coefficients), or they are crammed into the formulas of the PC.

So, armed with this new knowledge, you should be able to understand the official definition of the Student distribution.
A random variable obeying the Student's distribution with k degrees of freedom, is called the ratio of independent random variables

where ξ distributed according to the standard normal law, and χ 2 k obeys distribution χ 2 c k degrees of freedom.

Thus, the formula for the Student's test for the arithmetic mean

There is a special case of student relations

From the formula and definition it follows that the distribution of the Student's t-test depends only on the number of degrees of freedom.

At k> 30 t-test is practically indistinguishable from the standard normal distribution.

Unlike the chi-square, the t-test can be one- and two-tailed. Usually they use two-sided, assuming that the deviation can occur in both directions from the mean. But if the condition of the problem allows deviation only in one direction, then it is reasonable to apply a one-sided criterion. From this, the power increases slightly, because at a fixed level of significance critical value approaches zero a little.

Conditions for using Student's t-test

Despite the fact that Student's discovery at one time made a revolution in statistics, the t-test is still quite limited in its application possibilities, since itself comes from the assumption of a normal distribution of the original data. If the data are not normal (which is usually the case), then the t-test will no longer have a Student's distribution. However, due to the action of the central limit theorem, the mean, even for abnormal data, quickly acquires a bell-shaped distribution.

Consider, for example, data that is skewed to the right, like the chi-square distribution with 5 degrees of freedom.

Now let's create 20 thousand samples and observe how the distribution of averages changes depending on their size.

The difference is quite noticeable in small samples of up to 15-20 observations. But then it rapidly disappears. Thus, the abnormality of the distribution is, of course, not good, but not critical.

Most of all, the t-test is "afraid" of outliers, i.e. abnormal deviations. Let's take 20 thousand normal samples of 15 observations and add one random outlier to some of them.

The picture is not happy. The actual frequencies of the averages are very different from the theoretical ones. Using the t-distribution in such a situation becomes a very risky undertaking.

So, in not very small samples (from 15 observations), the t-test is relatively resistant to the abnormal distribution of the initial data. But outliers in the data greatly distort the distribution of the t-test, which, in turn, can lead to errors in statistical inference, so you should get rid of anomalous observations. Often, all values ​​outside ± 2 standard deviations of the mean are removed from the sample.

An example of testing the hypothesis about the expected value using the Student's t-test in MS Excel

Excel has several t-distribution related functions. Let's consider them.

STUDENT.DIST - "classical" left-sided Student's t-distribution. The input is the value of the t-criterion, the number of degrees of freedom and an option (0 or 1) that determines what needs to be calculated: the density or the value of the function. At the output, we obtain, respectively, the density or the probability that the random variable will be less than the t-criterion specified in the argument.

STUDENT.DIST.2X - two-sided distribution. The argument is absolute value(modulo) t-criterion and the number of degrees of freedom. At the output, we get the probability of getting this or else more value t-test, i.e. the actual level of significance (p-level).

STUDENT.DIST.RF - right-sided t-distribution. So, 1-STUDENT.DIST (2; 5; 1) = STUDENT.DIST.PH (2; 5) = 0.05097. If the t-test is positive, then the resulting probability is the p-level.

STUDENT.OBR - used to calculate the left-sided inverse of the t-distribution. The argument is the probability and the number of degrees of freedom. At the output, we obtain the value of the t-criterion corresponding to this probability. Probability counts down to the left. Therefore, for the left tail, the level of significance itself is needed α , and for the right one - α .

STUDENT.OBR.2X - the inverse value for the two-tailed Student's distribution, i.e. t-test value (modulo). The level of significance is also fed into the input. α ... Only this time, the counting is carried out from both sides at the same time, so the probability is distributed over the two tails. So, STUDENT.OBR (1-0.025; 5) = STUDENT.OBR.2X (0.05; 5) = 2.57058

STUDENT.TEST is a function for testing the hypothesis about the equality of mathematical expectations in two samples. Replaces a bunch of calculations, because it is enough to specify only two data ranges and a couple of parameters. The output is p-level.

TRUST.STUDENT - calculation of the confidence interval of the mean, taking into account the t-distribution.

Let's consider such a tutorial example. The enterprise packs cement in 50 kg bags. Due to randomness in a single bag, some deviation from the expected mass is allowed, but the general average should remain 50 kg. In the quality control department, 9 bags were randomly weighed and the following results were obtained: average weight ( ) was 50.3 kg, the standard deviation ( s) - 0.5kg.

Does this result agree with the null hypothesis that the general average is 50kg? In other words, is it possible to get such a result by pure chance, if the equipment is working properly and produces an average filling of 50 kg? If the hypothesis is not rejected, then the resulting difference fits into the range of random fluctuations; if the hypothesis is rejected, then, most likely, a failure has occurred in the settings of the bag filling machine. It needs to be verified and configured.

Brief condition in general adopted designations looks like that.

H 0: μ = 50 kg

H 1: μ ≠ 50 kg

There is reason to believe that the distribution of bag occupancy obeys a normal distribution (or does not differ much from it). Hence, to test the hypothesis about the mathematical expectation, you can use the Student's t-test. Random deviations can occur in any direction, so a two-sided t-test is needed.

First, we will apply antediluvian means: manual calculation of the t-criterion and comparing it with the critical table value. Calculated t-test:

Now let's determine whether the resulting number goes beyond the critical level at the significance level α = 0.05. Let's use the Student's t-distribution table (available in any textbook on statistics).

The columns show the probability of the right-hand side of the distribution, and the rows show the number of degrees of freedom. We are interested in a two-tailed t-test with a significance level of 0.05, which is equivalent to a t-value for half the significance level on the right: 1 - 0.05 / 2 = 0.975. The number of degrees of freedom is the sample size minus 1, i.e. 9 - 1 = 8. At the intersection we find table value t-test - 2.306. If we used the standard normal distribution, then the critical point would be 1.96, but here it is more, because The t-distribution on small samples is more flattened.

We compare the actual (1.8) and the tabular value (2.306). The calculated criterion turned out to be less than the tabular one. Consequently, the available data do not contradict the hypothesis H 0 that the general average is 50 kg (but do not prove it either). That's all we can learn using tables. You can, of course, try to find the p-level, but it will be approximate. And, as a rule, it is the p-level that is used to test hypotheses. Therefore, further we go to Excel.

There is no ready-made function for calculating the t-criterion in Excel. But this is not scary, because the formula for Student's t-test is quite simple and can be easily constructed right in an Excel cell.

Got the same 1.8. Let's find the critical value first. We take alpha 0.05, a two-sided criterion. We need an inverse t-distribution function for the two-sided hypothesis STUDENT.OBR.2X.

The resulting value cuts off the critical area. The observed t-test does not fall into it, so the hypothesis is not rejected.

However, this is the same way of testing a hypothesis using a table value. It will be more informative to calculate the p-level, i.e. the probability of getting the observed or even greater deviation from the average of 50kg, if this hypothesis is correct. The Student's distribution function is required for the two-sided hypothesis STUDENT.DIST 2X.

The P-level is 0.1096, which is more than the permissible significance level of 0.05 - we do not reject the hypothesis. But now we can judge the degree of proof. The P-level turned out to be pretty close to the level where the hypothesis is rejected, and this leads to different thoughts. For example, that the sample was too small to detect a significant variance.

After a while, the control department again decided to check how the standard of bag filling is being maintained. This time, for greater reliability, not 9, but 25 bags were selected. It is intuitively clear that the spread of the average will decrease, which means that the chances of finding a failure in the system will increase.

Let's say that the same values ​​of the mean and standard deviation for the sample were obtained as for the first time (50.3 and 0.5, respectively). Let's calculate the t-test.


The critical value for 24 degrees of freedom and α = 0.05 is 2.064. The picture below shows that the t-test falls into the hypothesis rejection area.

It can be concluded that, with a confidence level of more than 95%, the general average differs from 50kg. To be more convincing, let's look at the p-level (the last row in the table). If the hypothesis is correct, the probability of getting the average with such or even greater deviation from 50 is 0.0062, or 0.62%, which is practically impossible with a single measurement. In general, we reject the hypothesis as unlikely.

Calculating the Confidence Interval Using Student's t-Distribution

Another statistical method closely related to hypothesis testing is calculation of confidence intervals... If the obtained interval contains a value corresponding to the null hypothesis, then this is equivalent to the fact that the null hypothesis is not rejected. Otherwise, the hypothesis is rejected with an appropriate confidence level. In some cases, analysts do not test hypotheses at all in classic form, but only the confidence intervals are calculated. This approach allows you to extract even more useful information.

Let's calculate the confidence intervals for the mean at 9 and 25 observations. For this we will use Excel function CONFIDENCE STUDENT. Here, oddly enough, everything is pretty simple. Only the significance level needs to be specified in the function arguments α , standard deviation by sample and sample size. At the output, we get the half-width of the confidence interval, that is, the value that needs to be postponed on both sides of the average. After performing the calculations and drawing a visual diagram, we get the following.

As you can see, with a sample of 9 observations, the value 50 falls into confidence interval(the hypothesis is not rejected), but with 25 observations it does not fall (the hypothesis is rejected). At the same time, in the experiment with 25 bags, it can be argued that with a probability of 97.5%, the general average exceeds 50.1 kg (the lower limit of the confidence interval is 50.094 kg). And this is quite valuable information.

Thus, we have solved the same problem in three ways:

1. An ancient approach, comparing the calculated and tabular value of the t-criterion
2. More modern, calculating the p-level, adding a degree of confidence when the hypothesis is rejected.
3. Even more informative by calculating the confidence interval and obtaining the minimum value of the general average.

It is important to remember that the t-test refers to parametric methods, since based on normal distribution (it has two parameters: mean and variance). Therefore, for its successful application, at least an approximate normality of the initial data and the absence of outliers are important.

Finally, I propose to watch a video on how to carry out calculations related to the Student's t-test in Excel.

Student's t-test is the general name for a class of statistical hypothesis testing methods (statistical tests) based on the Student's distribution. The most common cases of using the t-test are associated with checking the equality of the mean values ​​in two samples.

1. History of the development of the t-criterion

This criterion was developed William Gosset to assess the quality of beer in the Guinness company. In connection with obligations to the company for non-disclosure of trade secrets, Gosset's article was published in 1908 in Biometrics magazine under the pseudonym Student.

2. What is Student's t-test used for?

Student's t-test is used to determine statistical significance differences in average values. It can be used both in cases of comparing independent samples ( for example, a group of patients with diabetes mellitus and a group of healthy), and when comparing related populations ( for example, the average heart rate in the same patient before and after taking an antiarrhythmic drug).

3. When can you use Student's t-test?

To apply the Student's t-test, it is necessary that the initial data have normal distribution... In the case of using a two-sample test for independent samples, it is also necessary to satisfy the condition equality (homoscedasticity) variances.

If these conditions are not met, similar methods should be used when comparing sample means. nonparametric statistics, among which the most famous are Mann-Whitney U-test(as a two-sample test for independent samples), and sign criterion and Wilcoxon test(used in cases of dependent selections).

4. How to calculate Student's t-test?

To compare the mean values, the Student's t-test is calculated using the following formula:

where M 1- the arithmetic mean of the first compared population (group), M 2- the arithmetic mean of the second compared population (group), m 1 - average error first arithmetic mean, m 2 is the mean error of the second arithmetic mean.

5. How to interpret the value of the Student's t-test?

The obtained value of the Student's t-test must be correctly interpreted. To do this, we need to know the number of subjects in each group (n 1 and n 2). Find the number of degrees of freedom f by the following formula:

f = (n 1 + n 2) - 2

After that, we determine the critical value of the Student's t-test for the required level of significance (for example, p = 0.05) and for a given number of degrees of freedom f according to the table ( see below).

We compare the critical and calculated values ​​of the criterion:

  • If the calculated value of the Student's t-test equal or more critical, found from the table, we draw a conclusion about the statistical significance of the differences between the compared values.
  • If the value of the calculated Student's t-test less tabular, which means that the differences between the compared values ​​are statistically insignificant.

6. An example of calculating the Student's t-test

To study the effectiveness of the new iron preparation, two groups of patients with anemia were selected. In the first group, patients received new drug and the second group received a placebo. After that, the hemoglobin level in the peripheral blood was measured. In the first group, the average hemoglobin level was 115.4 ± 1.2 g / l, and in the second - 103.7 ± 2.3 g / l (data are presented in the format M ± m), the compared populations have a normal distribution. In this case, the number of the first group was 34, and the second - 40 patients. It is necessary to draw a conclusion about the statistical significance of the obtained differences and the effectiveness of the new iron preparation.

Solution: To assess the significance of the differences, we use the Student's t-test, calculated as the difference between the mean values, divided by the sum of the squares of the errors:

After performing the calculations, the t-criterion value turned out to be 4.51. We find the number of degrees of freedom as (34 + 40) - 2 = 72. Compare the obtained value of the Student's t-test 4.51 with the critical value at p = 0.05 indicated in the table: 1.993. Since the calculated value of the criterion is greater than the critical one, we conclude that the observed differences are statistically significant (significance level p<0,05).

In the course of the example, we will use fictitious information so that the reader can carry out the necessary transformations on their own.

So, for example, in the course of research, we studied the effect of drug A on the content of substance B (in mmol / g) in tissue C and the concentration of substance D in the blood (in mmol / l) in patients divided by some criterion E into 3 groups of equal volume (n = 10). The results of such a fictional study are shown in the table:

Substance B content, mmol / g

Substance D, mmol / l

increase in concentration


We would like to warn you that samples of size 10 are considered by us for simplicity of data presentation and calculations; in practice, such a sample size is usually not enough to form a statistical conclusion.

As an example, consider the data of the 1st column of the table.

Descriptive statistics

Sample mean

The arithmetic mean, often referred to simply as the “average,” is obtained by adding all the values ​​and dividing that sum by the number of values ​​in a set. This can be shown using an algebraic formula. The set of n observations of the variable x can be represented as x 1, x 2, x 3, ..., x n

The formula for determining the arithmetic mean of observations (pronounced "x with a bar"):

= (X 1 + X 2 + ... + X n) / n

= (12 + 13 + 14 + 15 + 14 + 13 + 13 + 10 + 11 + 16) / 10 = 13,1;

Sample variance

One way to measure data scatter is to determine the degree to which each observation deviates from the arithmetic mean. Obviously, the greater the deviation, the greater the variability, the variability of observations. However, we cannot use the mean of these deviations. as a measure of scattering, because positive deviations compensate for negative deviations (their sum is zero). To solve this problem, we square each deviation and find the mean of the squared deviations; this quantity is called variation, or variance. Take n observations x 1, x 2, x 3, ..., x n, medium which equals... We calculate the dispersion this, usually denoted ass 2,these observations:

The sample variance of this indicator is s 2 = 3.2.

Root mean square deviation

The standard (root mean square) deviation is the positive square root of the variance. Using n observations as an example, it looks like this:

We can think of standard deviation as a kind of mean deviation of observations from the mean. It is calculated in the same units (dimensions) as the original data.

s = sqrt (s 2) = sqrt (3.2) = 1.79.

The coefficient of variation

If you divide the standard deviation by the arithmetic mean and express the result as a percentage, you get the coefficient of variation.

CV = (1.79 / 13.1) * 100% = 13.7

Sample mean error

1.79 / sqrt (10) = 0.57;

Student's t coefficient (one-sample t-test)

It is used to test the hypothesis that the mean value differs from some known value m

The number of degrees of freedom is calculated as f = n-1.

In this case, the confidence interval for the mean is between the boundaries of 11.87 and 14.39.

For the 95% confidence level, m = 11.87 or m = 14.39, that is, = | 13.1-11.82 | = | 13.1-14.38 | = 1.28

Accordingly, in this case, for the number of degrees of freedom f = 10 - 1 = 9 and a confidence level of 95% t = 2.26.

Dialog Basic Statistics and Tables

In the module Basic statistics and tables choose Descriptive statistics.

A dialog box will open Descriptive statistics.

In field Variables choose Group 1.

Pressing OK, we get tables of results with descriptive statistics of the selected variables.

A dialog box will open One-sample t-test.

Suppose we know that the average content of substance B in tissue C is 11.

The results table with descriptive statistics and Student's t-test is as follows:

We had to reject the hypothesis that the average content of substance B in tissue C is 11.

Since the calculated value of the criterion is greater than the tabular one (2.26), the null hypothesis is rejected at the chosen level of significance, and the differences between the sample and the known value are recognized as statistically significant. Thus, the conclusion about the existence of differences made using the Student's test is confirmed using this method.

When can you use Student's t-test?

To apply the Student's t-test, it is necessary that the initial data have normal distribution... In the case of using a two-sample test for independent samples, it is also necessary to satisfy the condition equality (homoscedasticity) variances.

If these conditions are not met, similar methods should be used when comparing sample means. nonparametric statistics, among which the most famous are Mann-Whitney U-test(as a two-sample test for independent samples), and sign criterion and Wilcoxon test(used in cases of dependent selections).

To compare the mean values, the Student's t-test is calculated using the following formula:

where M 1- the arithmetic mean of the first compared population (group), M 2- the arithmetic mean of the second compared population (group), m 1- mean error of the first arithmetic mean, m 2 is the mean error of the second arithmetic mean.

How to interpret the value of Student's t-test?

The obtained value of the Student's t-test must be correctly interpreted. To do this, we need to know the number of subjects in each group (n 1 and n 2). Find the number of degrees of freedom f by the following formula:

f = (n 1 + n 2) - 2

After that, we determine the critical value of the Student's t-test for the required level of significance (for example, p = 0.05) and for a given number of degrees of freedom f according to the table ( see below).

We compare the critical and calculated values ​​of the criterion:

If the calculated value of the Student's t-test equal or more critical, found from the table, we draw a conclusion about the statistical significance of the differences between the compared values.

If the value of the calculated Student's t-test less tabular, which means that the differences between the compared values ​​are statistically insignificant.

An example of calculating the Student's t-test

To study the effectiveness of the new iron preparation, two groups of patients with anemia were selected. In the first group, patients received a new drug for two weeks, and in the second group, they received a placebo. After that, the hemoglobin level in the peripheral blood was measured. In the first group, the average hemoglobin level was 115.4 ± 1.2 g / l, and in the second - 103.7 ± 2.3 g / l (data are presented in the format M ± m), the compared populations have a normal distribution. In this case, the number of the first group was 34, and the second - 40 patients. It is necessary to draw a conclusion about the statistical significance of the obtained differences and the effectiveness of the new iron preparation.

Solution: To assess the significance of the differences, we use the Student's t-test, calculated as the difference between the mean values, divided by the sum of the squares of the errors:

After performing the calculations, the t-criterion value turned out to be 4.51. We find the number of degrees of freedom as (34 + 40) - 2 = 72. Compare the obtained value of the Student's t-test 4.51 with the critical value at p = 0.05 indicated in the table: 1.993. Since the calculated value of the criterion is greater than the critical one, we conclude that the observed differences are statistically significant (significance level p<0,05).

The Fisher distribution is the distribution of a random variable

where the random variables X 1 and X 2 are independent and have chi-square distributions with the number of degrees of freedom k 1 and k 2 respectively. In this case, the pair (k 1, k 2)- a pair of "numbers of degrees of freedom" of the Fisher distribution, namely, k 1 Is the number of degrees of freedom of the numerator, and k 2- the number of degrees of freedom of the denominator. Distribution of a random variable F named after the great English statistician R. Fisher (1890-1962), who actively used it in his works.

The Fisher distribution is used to test hypotheses about the adequacy of the model in regression analysis, about the equality of variances, and in other problems of applied statistics.

Student's critical values ​​table.

Form start

Number of degrees of freedom, f Student's t-test value at p = 0.05
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.040
2.037
2.035
2.032
2.030
2.028
2.026
2.024
40-41 2.021
42-43 2.018
44-45 2.015
46-47 2.013
48-49 2.011
50-51 2.009
52-53 2.007
54-55 2.005
56-57 2.003
58-59 2.002
60-61 2.000
62-63 1.999
64-65 1.998
66-67 1.997
68-69 1.995
70-71 1.994
72-73 1.993
74-75 1.993
76-77 1.992
78-79 1.991
80-89 1.990
90-99 1.987
100-119 1.984
120-139 1.980
140-159 1.977
160-179 1.975
180-199 1.973
1.972
1.960

The method allows you to test the hypothesis that the mean values ​​of two general populations from which the compared dependent samples differ from each other. The assumption of dependence most often means that the trait is measured on the same sample twice, for example, before and after exposure. In the general case, each representative of one sample is assigned a representative from the other sample (they are combined in pairs) so that the two data series are positively correlated with each other. Weaker types of sample dependence: sample 1 - husbands, sample 2 - their wives; sample 1 - one-year-old children, sample 2 is composed of twins from children in sample 1, etc.

Testable statistical hypothesis, as in the previous case, H 0: M 1 = M 2(the mean values ​​in samples 1 and 2 are equal) .If it is rejected, an alternative hypothesis is accepted that M 1 more less) M 2.

Initial assumptions for statistical verification:

□ each representative of one sample (from one general population) is assigned a representative of another sample (from another general population);

□ data from two samples are positively correlated (form pairs);

□ the distribution of the studied trait in both samples corresponds to the normal law.

Source data structure: there are two values ​​of the studied attribute for each object (for each pair).

Restrictions: the distribution of the trait in both the sample should not differ significantly from the normal one; the data of two measurements corresponding to both samples are positively correlated.

Alternatives: Wilcoxon's T test, if the distribution for at least one sample differs significantly from the normal one; Student's t-test for independent samples - if the data for two samples do not correlate positively.

Formula for the empirical value of Student's t-test reflects the fact that the unit of analysis of differences is difference (shift) characteristic values ​​for each pair of observations. Accordingly, for each of the N pairs of feature values, the difference is first calculated d i = x 1 i - x 2 i.

(3) where M d is the average difference in values; σ d is the standard deviation of the differences.

Calculation example:

Suppose, in the course of checking the effectiveness of the training, each of the 8 members of the group was asked the question "How often do your opinions coincide with those of the group?" - twice, before and after the training. For answers, a 10-point scale was used: 1 - never, 5 - half the time, 10 - always. The hypothesis was tested that as a result of the training, the self-esteem of conformism (the desire to be like others in the group) of the participants would increase (α = 0.05). Let's make a table for intermediate calculations (table 3).

Table 3

The arithmetic mean for the difference M d = (-6) / 8 = -0.75. Subtract this value from each d (the penultimate column of the table).

The standard deviation formula differs only in that instead of X it appears d. Substituting all the required values, we get

σ d = = 0.886.

Step 1. Calculate the empirical value of the criterion using the formula (3): the average difference M d= -0.75; standard deviation σ d = 0,886; t e = 2,39; df = 7.

Step 2. Determine the p-level of significance from the table of critical values ​​of the Student's t-criterion. For df = 7, the empirical value is between the critical values ​​for p = 0.05 and p - 0.01. Therefore, p< 0,05.

df R
0,05 0,01 0,001
2,365 3,499 5,408

Step 3. We make a statistical decision and formulate a conclusion. The statistical hypothesis of equality of means is rejected. Conclusion: the self-esteem of the participants' conformism after the training increased statistically significantly (at the level of significance p< 0,05).

Parametric methods include comparison of variances of two samples by criterion F-Fisher. Sometimes this method leads to valuable meaningful conclusions, and in the case of comparing the means for independent samples, the comparison of variances is compulsory procedure.

To calculate F emp it is necessary to find the ratio of the variances of the two samples, and so that the larger variance would be in the numerator, and the smaller one in the denominator.

Comparison of variances... The method allows you to test the hypothesis that the variances of the two general populations from which the compared samples are derived differ from each other. The tested statistical hypothesis H 0: σ 1 2 = σ 2 2 (the variance in sample 1 is equal to the variance in sample 2). If it is rejected, an alternative hypothesis is accepted that one variance is greater than the other.

Initial assumptions: two samples are taken randomly from different general populations with a normal distribution of the trait under study.

Source data structure: the trait under study is measured in objects (subjects), each of which belongs to one of the two compared samples.

Restrictions: the distribution of the trait in both samples does not differ significantly from the normal one.

Alternative to the method: Levene "sTest", the application of which does not require testing the assumption of normality (used in the SPSS program).

Formula for the empirical value of the F-Fisher criterion:

(4)

where σ 1 2 - large variance, a σ 2 2- smaller variance. Since it is not known in advance which variance is greater, then to determine the p-level, we use Table of critical values ​​for non-directional alternatives. If F e> F Kp for the corresponding number of degrees of freedom, then R < 0,05 и статистическую гипотезу о равенстве дисперсий можно отклонить (для α = 0,05).

Calculation example:

The children were given the usual arithmetic tasks, after which one randomly selected half of the students were told that they had not passed the test, and the rest - the opposite. Each child was then asked how many seconds it would take to solve a similar problem. The experimenter calculated the difference between the time the child called and the result of the completed task (in seconds). It was expected that reporting a failure would cause some inadequacy in the child's self-esteem. The hypothesis being tested (at the level of α = 0.005) was that the variance of the set of self-assessments does not depend on reports of success or failure (Н 0: σ 1 2 = σ 2 2).

The following data were obtained:


Step 1. Let us calculate the empirical value of the criterion and the number of degrees of freedom by the formulas (4):

Step 2. According to the table of critical values ​​of the f-Fisher criterion for undirected alternatives find a critical value for df number = 11; df banner= 11. However, there is a critical value only for df number= 10 and df banner = 12. It is impossible to take a larger number of degrees of freedom, therefore we take the critical value for df number= 10: For R = 0,05 F Kp = 3.526; for R = 0,01 F Kp = 5,418.

Step 3. Making a statistical decision and meaningful conclusion. Since the empirical value exceeds the critical value for R= 0.01 (and even more so - for p = 0.05), then in this case p< 0,01 и принимается альтернативная гипо­теза: дисперсия в группе 1 превышает дисперсию в группе 2 (R< 0.01). Therefore, after reporting failure, the inadequacy of self-esteem is higher than after reporting success.

/ workshop-statistics / reference material / student t-test values

Meaningt - Student's criterion at the significance level of 0.10, 0.05 and 0.01

ν - degrees of freedom of variation

Standard values ​​of Student's test

Number of degrees of freedom

Significance levels

Number of degrees of freedom

Significance levels

table XI

Standard values ​​of Fisher's test used to assess the significance of differences between two samples

Degrees of freedom

Significance level

Degrees of freedom

Significance level

Student's t-criterion

Student's t-test- a general name for a class of methods for statistical testing of hypotheses (statistical tests) based on the Student's distribution. The most common cases of using the t-test are associated with checking the equality of the mean values ​​in two samples.

t-Statistics is usually constructed according to the following general principle: in the numerator there is a random variable with zero mathematical expectation (when the null hypothesis is fulfilled), and in the denominator is the sample standard deviation of this random variable, obtained as the square root of the unmixed variance estimate.

Story

This criterion was developed by William Gossett to assess the quality of beer at Guinness. In connection with the obligation to the company for non-disclosure of trade secrets (the Guinness leadership considered the use of the statistical apparatus in their work as such), Gosset's article was published in 1908 in the journal "Biometrics" under the pseudonym "Student".

Data requirements

To apply this criterion, it is necessary that the original data have a normal distribution. In the case of using a two-sample test for independent samples, the condition of equality of variances must also be met. There are, however, alternatives to the Student's test for situations with unequal variances.

The normality requirement for the data distribution is necessary for an accurate t (\ displaystyle t) -test. However, even with other data distributions, t (\ displaystyle t) -statistics can be used. In many cases, this statistic asymptotically has a standard normal distribution - N (0, 1) (\ displaystyle N (0,1)), so you can use the quantiles of this distribution. However, often even in this case, the quantiles are not used for the standard normal distribution, but for the corresponding Student distribution, as in the exact t (\ displaystyle t) -test. Asymptotically, they are equivalent; however, on small samples, the confidence intervals of the Student's distribution are wider and more reliable.

One-sample t-test

Used to test the null hypothesis H 0: E (X) = m (\ displaystyle H_ (0): E (X) = m) that the mathematical expectation E (X) (\ displaystyle E (X)) is equal to some known value m ( \ displaystyle m).

Obviously, under the null hypothesis E (X ¯) = m (\ displaystyle E ((\ overline (X))) = m). Assuming the assumed independence of observations, V (X ¯) = σ 2 / n (\ displaystyle V ((\ overline (X))) = \ sigma ^ (2) / n). Using the unbiased variance estimate s X 2 = ∑ t = 1 n (X t - X ¯) 2 / (n - 1) (\ displaystyle s_ (X) ^ (2) = \ sum _ (t = 1) ^ (n ) (X_ (t) - (\ overline (X))) ^ (2) / (n-1)) we get the following t-statistic:

t = X ¯ - m s X / n (\ displaystyle t = (\ frac ((\ overline (X)) - m) (s_ (X) / (\ sqrt (n)))))

Under the null hypothesis, the distribution of this statistic is t (n - 1) (\ displaystyle t (n-1)). Therefore, if the absolute value of the statistics exceeds the critical value of the given distribution (at a given level of significance), the null hypothesis is rejected.

Two-sample t-test for independent samples

Let there are two independent samples of sizes n 1, n 2 (\ displaystyle n_ (1) ~, ~ n_ (2)) normally distributed random variables X 1, X 2 (\ displaystyle X_ (1), ~ X_ (2)). It is necessary to check the null hypothesis of the equality of the mathematical expectations of these random variables H 0: M 1 = M 2 (\ displaystyle H_ (0): ~ M_ (1) = M_ (2)) using the sample data.

Consider the difference between the sample means Δ = X ¯ 1 - X ¯ 2 (\ displaystyle \ Delta = (\ overline (X)) _ (1) - (\ overline (X)) _ (2)). Obviously, if the null hypothesis is true E (Δ) = M 1 - M 2 = 0 (\ displaystyle E (\ Delta) = M_ (1) -M_ (2) = 0). The variance of this difference is, based on the independence of the samples, V (Δ) = σ 1 2 n 1 + σ 2 2 n 2 (\ displaystyle V (\ Delta) = (\ frac (\ sigma _ (1) ^ (2)) ( n_ (1))) + (\ frac (\ sigma _ (2) ^ (2)) (n_ (2)))). Then using the unbiased variance estimate s 2 = ∑ t = 1 n (X t - X ¯) 2 n - 1 (\ displaystyle s ^ (2) = (\ frac (\ sum _ (t = 1) ^ (n) ( X_ (t) - (\ overline (X))) ^ (2)) (n-1))) we obtain an unbiased estimate of the variance of the difference of the sample means: s Δ 2 = s 1 2 n 1 + s 2 2 n 2 (\ displaystyle s _ (\ Delta) ^ (2) = (\ frac (s_ (1) ^ (2)) (n_ (1))) + (\ frac (s_ (2) ^ (2)) (n_ (2) ))). Therefore, the t-statistic for testing the null hypothesis is

T = X ¯ 1 - X ¯ 2 s 1 2 n 1 + s 2 2 n 2 (\ displaystyle t = (\ frac ((\ overline (X)) _ (1) - (\ overline (X)) _ ( 2)) (\ sqrt ((\ frac (s_ (1) ^ (2)) (n_ (1))) + (\ frac (s_ (2) ^ (2)) (n_ (2))))) ))

Under the null hypothesis, this statistic has a distribution t (df) (\ displaystyle t (df)), where df = (s 1 2 / n 1 + s 2 2 / n 2) 2 (s 1 2 / n 1) 2 / (n 1 - 1) + (s 2 2 / n 2) 2 / (n 2 - 1) (\ displaystyle df = (\ frac ((s_ (1) ^ (2) / n_ (1) + s_ (2 ) ^ (2) / n_ (2)) ^ (2)) ((s_ (1) ^ (2) / n_ (1)) ^ (2) / (n_ (1) -1) + (s_ (2 ) ^ (2) / n_ (2)) ^ (2) / (n_ (2) -1))))

The case of the same variance

If the variances of the samples are assumed to be the same, then

V (Δ) = σ 2 (1 n 1 + 1 n 2) (\ displaystyle V (\ Delta) = \ sigma ^ (2) \ left ((\ frac (1) (n_ (1))) + (\ frac (1) (n_ (2))) \ right))

Then the t-statistic is equal to:

T = X ¯ 1 - X ¯ 2 s X 1 n 1 + 1 n 2, s X = (n 1 - 1) s 1 2 + (n 2 - 1) s 2 2 n 1 + n 2 - 2 (\ displaystyle t = (\ frac ((\ overline (X)) _ (1) - (\ overline (X)) _ (2)) (s_ (X) (\ sqrt ((\ frac (1) (n_ (1 ))) + (\ frac (1) (n_ (2))))))) ~, ~~ s_ (X) = (\ sqrt (\ frac ((n_ (1) -1) s_ (1) ^ (2) + (n_ (2) -1) s_ (2) ^ (2)) (n_ (1) + n_ (2) -2))))

This statistic has a distribution t (n 1 + n 2 - 2) (\ displaystyle t (n_ (1) + n_ (2) -2))

Two-sample t-test for dependent samples

To calculate the empirical value of the t (\ displaystyle t) -test in a hypothesis test about the difference between two dependent samples (for example, two samples of the same test at a time interval), the following formula is used:

T = M d s d / n (\ displaystyle t = (\ frac (M_ (d)) (s_ (d) / (\ sqrt (n)))))

where M d (\ displaystyle M_ (d)) is the mean difference, s d (\ displaystyle s_ (d)) is the standard deviation of the differences, and n is the number of observations

This statistic has a distribution of t (n - 1) (\ displaystyle t (n-1)).

Linear Constraint Test on Linear Regression Parameters

The t-test can also test an arbitrary (one) linear constraint on the parameters of linear regression estimated using the usual least squares method. Suppose you want to test the hypothesis H 0: c T b = a (\ displaystyle H_ (0): c ^ (T) b = a). Obviously, under the null hypothesis E (c T b ^ - a) = c TE (b ^) - a = 0 (\ displaystyle E (c ^ (T) (\ hat (b)) - a) = c ^ ( T) E ((\ hat (b))) - a = 0). Here we used the property of unbiasedness of the OLS estimates of the model parameters E (b ^) = b (\ displaystyle E ((\ hat (b))) = b). In addition, V (c T b ^ - a) = c TV (b ^) c = σ 2 c T (XTX) - 1 c (\ displaystyle V (c ^ (T) (\ hat (b)) - a ) = c ^ (T) V ((\ hat (b))) c = \ sigma ^ (2) c ^ (T) (X ^ (T) X) ^ (- 1) c). Using its unbiased estimate s 2 = E S S / (n - k) (\ displaystyle s ^ (2) = ESS / (n-k)) instead of the unknown variance, we get the following t-statistic:

T = c T b ^ - asc T (XTX) - 1 c (\ displaystyle t = (\ frac (c ^ (T) (\ hat (b)) - a) (s (\ sqrt (c ^ (T) (X ^ (T) X) ^ (- 1) c)))))

This statistic under the null hypothesis has a distribution of t (n - k) (\ displaystyle t (n-k)), so if the statistic is above the critical value, then the null hypothesis of the linear constraint is rejected.

Linear Regression Ratio Hypothesis Testing

A special case of a linear constraint is to test the hypothesis that the regression coefficient b j (\ displaystyle b_ (j)) is equal to some value a (\ displaystyle a). In this case, the corresponding t-statistic is:

T = b ^ j - a s b ^ j (\ displaystyle t = (\ frac ((\ hat (b)) _ (j) -a) (s _ ((\ hat (b)) _ (j)))))

where s b ^ j (\ displaystyle s _ ((\ hat (b)) _ (j))) is the standard error of the coefficient estimate, which is the square root of the corresponding diagonal element of the coefficient estimate covariance matrix.

Under the null hypothesis, the distribution of this statistic is t (n - k) (\ displaystyle t (n-k)). If the absolute value of the statistic is higher than the critical value, then the difference between the coefficient and a (\ displaystyle a) is statistically significant (not random), otherwise it is insignificant (random, that is, the true coefficient is probably equal or very close to the assumed value of a (\ displaystyle a))

Comment

The one-sample test for mathematical expectations can be reduced to checking the linear constraint on the parameters of linear regression. In a one-sample test, this is a "regression" for a constant. Therefore, the s 2 (\ displaystyle s ^ (2)) of the regression is the sample estimate of the variance of the random variable being studied, the XTX matrix (\ displaystyle X ^ (T) X) is n (\ displaystyle n), and the estimate of the "coefficient" of the model is the sample mean. From this we obtain the expression for the t-statistic given above for the general case.

Similarly, it can be shown that a two-sample test with equal sample variances also comes down to checking linear constraints. In a two-sample test, this is a "regression" on a constant and a dummy variable that identifies the subsample depending on the value (0 or 1): y = a + b D (\ displaystyle y = a + bD). The hypothesis about the equality of mathematical expectations of samples can be formulated as a hypothesis about the equality of the coefficient b of this model to zero. It can be shown that the corresponding t-statistic for testing this hypothesis is equal to the t-statistic given for the two-sample test.

It can also be reduced to checking the linear constraint in the case of different variances. In this case, the variance of the model errors takes two values. Based on this, one can also obtain a t-statistic similar to that shown for the two-sample test.

Nonparametric analogs

An analogue of the two-sample test for independent samples is the Mann-Whitney U-test. For the situation with dependent samples, the analogs are the sign test and the Wilcoxon T test

Literature

Student. The probable error of a mean. // Biometrika. 1908. No. 6 (1). P. 1-25.

Links

On the criteria for testing hypotheses about the homogeneity of the means on the website of the Novosibirsk State Technical University