Definition of confidence interval example. Confidence interval for mathematical expectation

Instruction

Please note that interval(l1 or l2), the central region of which will be the estimate l*, and also in which the true value of the parameter is likely to be contained, will just be the confidence interval ohm or the corresponding value of the confidence level alpha. In this case, l* itself will refer to point estimates. For example, based on the results of any sample values ​​of a random value X (x1, x2,..., xn), it is necessary to calculate an unknown indicator parameter l, on which the distribution will depend. In this case, getting an estimate set parameter l* will consist in the fact that for each sample it will be necessary to put some value of the parameter in line, that is, to create a function of the results of observing the indicator Q, the value of which will be taken equal to the estimated value of the parameter l* in the form of the formula: l*=Q*( x1, x2,..., xn).

Note that any function on the results of an observation is called a statistic. Moreover, if it fully describes the parameter (phenomenon) under consideration, then it is called sufficient statistics. And because the results of observations are random, then l * will also be random variable. The task of calculating statistics should be carried out taking into account the criteria for its quality. Here it is necessary to take into account that the distribution law of the estimate is quite definite, the distribution of the probability density W(x, l).

You can calculate the confidence interval easy enough if you know the law about the distribution of valuation. For example, trust interval estimates for mathematical expectation(average random value) mx* =(1/n)*(x1+x2+ …+xn) . This estimate will be unbiased, that is, the mathematical expectation or average value of the indicator will be equal to the true value of the parameter (M(mx*) = mx).

You can establish that the variance of the estimate by mathematical expectation is: bx*^2=Dx/n. Based on the limit central theorem, we can draw the appropriate conclusion that the distribution law of this estimate is Gaussian (normal). Therefore, for calculations, you can use the indicator Ф (z) - the integral of probabilities. In this case, choose the length of the trust interval and 2ld, so you get: alpha \u003d P (mx-ld (using the property of the probability integral according to the formula: Ф (-z) \u003d 1- Ф (z)).

Build trust interval estimates of the mathematical expectation: - find the value of the formula (alpha + 1) / 2; - select the value equal to ld / sqrt (Dx / n) from the probability integral table; - take the estimate of the true variance: Dx * = (1 / n) * ( (x1 - mx*)^2+(x2 - mx*)^2+…+(xn - mx*)^2); interval according to the formula: (mx*-ld, mx*+ld).

Suppose we have a large number of items with a normal distribution of some characteristics (for example, a full warehouse of vegetables of the same type, the size and weight of which varies). You want to know the average characteristics of the entire batch of goods, but you have neither the time nor the inclination to measure and weigh each vegetable. You understand that this is not necessary. But how many pieces would you need to take for random inspection?

Before giving some formulas useful for this situation, we recall some notation.

First, if we did measure the entire warehouse of vegetables (this set of elements is called the general population), then we would know with all the accuracy available to us the average value of the weight of the entire batch. Let's call this average X cf .g en . - general average. We already know what is completely determined if its mean value and deviation s are known . True, so far we are neither X avg. nor s we do not know the general population. We can only take some sample, measure the values ​​we need and calculate for this sample both the mean value X sr. in sample and the standard deviation S sb.

It is known that if our custom check contains a large number of elements (usually n is greater than 30), and they are taken really random, then s population will almost not differ from S ..

Moreover, for the case normal distribution we can use the following formulas:

With a probability of 95%


With a probability of 99%



IN general view with probability Р (t)


The relationship between the value of t and the value of the probability P (t), with which we want to know the confidence interval, can be taken from the following table:


Thus, we have determined in what range the average value for the general population is (with a given probability).

Unless we have a large enough sample, we cannot claim that the population has s = S sel. In addition, in this case, the closeness of the sample to the normal distribution is problematic. In this case, also use S sb instead s in the formula:




but the value of t for a fixed probability P(t) will depend on the number of elements in the sample n. The larger n, the closer the resulting confidence interval will be to the value given by formula (1). The t values ​​in this case are taken from another table (Student's t-test), which we provide below:

Student's t-test values ​​for probability 0.95 and 0.99


Example 3 30 people were randomly selected from the employees of the company. According to the sample, it turned out that the average salary (per month) is 30 thousand rubles with an average square deviation of 5 thousand rubles. With a probability of 0.99 determine the average salary in the firm.

Solution: By condition, we have n = 30, X cf. =30000, S=5000, P=0.99. To find the confidence interval, we use the formula corresponding to the Student's criterion. According to the table for n \u003d 30 and P \u003d 0.99 we find t \u003d 2.756, therefore,


those. desired trust interval 27484< Х ср.ген < 32516.

So, with a probability of 0.99, it can be argued that the interval (27484; 32516) contains the average salary in the company.

We hope that you will use this method without necessarily having a spreadsheet with you every time. Calculations can be carried out automatically in Excel. While in an Excel file, click the fx button on the top menu. Then, select among the functions the type "statistical", and from the proposed list in the box - STEUDRASP. Then, at the prompt, placing the cursor in the "probability" field, type the value of the reciprocal probability (that is, in our case, instead of the probability of 0.95, you need to type the probability of 0.05). Apparently, the spreadsheet is designed so that the result answers the question of how likely we can be wrong. Similarly, in the "degree of freedom" field, enter the value (n-1) for your sample.

In the previous subsections, we considered the question of estimating the unknown parameter A one number. Such an assessment is called "point". In a number of tasks, it is required not only to find for the parameter A suitable numerical value, but also evaluate its accuracy and reliability. It is required to know what errors the parameter substitution can lead to A its point estimate A and with what degree of confidence can we expect that these errors will not go beyond known limits?

Problems of this kind are especially relevant for a small number of observations, when the point estimate and in is largely random and an approximate replacement of a by a can lead to serious errors.

To give an idea of ​​the accuracy and reliability of the estimate A,

V mathematical statistics use the so-called confidence intervals and confidence probabilities.

Let for the parameter A derived from experience unbiased estimate A. We want to estimate the possible error in this case. Let us assign some sufficiently large probability p (for example, p = 0.9, 0.95, or 0.99) such that an event with probability p can be considered practically certain, and find a value of s for which

Then the range of practically possible values ​​of the error that occurs when replacing A on A, will be ± s; large absolute errors will appear only with a small probability a = 1 - p. Let's rewrite (14.3.1) as:

Equality (14.3.2) means that with probability p the unknown value of the parameter A falls within the interval

In this case, one circumstance should be noted. Previously, we repeatedly considered the probability of a random variable falling into a given non-random interval. Here the situation is different: A not random, but random interval / r. Randomly its position on the x-axis, determined by its center A; in general, the length of the interval 2s is also random, since the value of s is calculated, as a rule, from experimental data. Therefore, in this case, it would be better to interpret the value of p not as the probability of "hitting" the point A into the interval / p, but as the probability that a random interval / p will cover the point A(Fig. 14.3.1).

Rice. 14.3.1

The probability p is called confidence level, and the interval / p - confidence interval. Interval boundaries if. a x \u003d a- s and a 2 = a + and are called trust boundaries.

Let's give one more interpretation to the concept of a confidence interval: it can be considered as an interval of parameter values A, compatible with experimental data and not contradicting them. Indeed, if we agree to consider an event with a probability a = 1-p practically impossible, then those values ​​of the parameter a for which a - a> s must be recognized as contradicting the experimental data, and those for which |a - A a t na 2 .

Let for the parameter A there is an unbiased estimate A. If we knew the law of distribution of the quantity A, the problem of finding the confidence interval would be quite simple: it would be enough to find a value of s for which

The difficulty lies in the fact that the distribution law of the estimate A depends on the law of distribution of quantity X and, consequently, on its unknown parameters (in particular, on the parameter itself A).

To get around this difficulty, one can apply the following roughly approximate trick: replace the unknown parameters in the expression for s with their point estimates. With comparatively large numbers experiments P(about 20 ... 30) this technique usually gives satisfactory results in terms of accuracy.

As an example, consider the problem of the confidence interval for the mathematical expectation.

Let produced P x, whose characteristics are the mathematical expectation T and variance D- unknown. For these parameters, the following estimates were obtained:

It is required to build a confidence interval / р, corresponding to the confidence probability р, for the mathematical expectation T quantities x.

In solving this problem, we use the fact that the quantity T is the sum P independent identically distributed random variables X h and according to the central limit theorem for sufficiently large P its distribution law is close to normal. In practice, even with a relatively small number of terms (of the order of 10 ... 20), the distribution law of the sum can be approximately considered normal. We will assume that the value T distributed according to the normal law. The characteristics of this law - the mathematical expectation and variance - are equal, respectively T And

(see chapter 13 subsection 13.3). Let's assume that the value D we know and find a value Ep for which

Applying formula (6.3.5) of Chapter 6, we express the probability on the left side of (14.3.5) in terms of the normal distribution function

where is the standard deviation of the estimate T.

From the equation

find the Sp value:

where arg Ф* (x) is the inverse function of Ф* (X), those. such a value of the argument for which the normal distribution function is equal to X.

Dispersion D, through which the value is expressed A 1P, we do not know exactly; as its approximate value, you can use the estimate D(14.3.4) and put approximately:

Thus, the problem of constructing a confidence interval is approximately solved, which is equal to:

where gp is defined by formula (14.3.7).

In order to avoid reverse interpolation in the tables of the function Ф * (l) when calculating s p, it is convenient to compile a special table (Table 14.3.1), which lists the values ​​of the quantity

depending on r. The value (p determines for the normal law the number of averages standard deviations, which must be set aside to the right and left of the dispersion center so that the probability of hitting the resulting area is equal to p.

Through the value of 7 p, the confidence interval is expressed as:

Table 14.3.1

Example 1. 20 experiments were carried out on the value x; the results are shown in table. 14.3.2.

Table 14.3.2

It is required to find an estimate of for the mathematical expectation of the quantity X and construct a confidence interval corresponding to a confidence level p = 0.8.

Solution. We have:

Choosing for the origin n: = 10, according to the third formula (14.2.14) we find the unbiased estimate D :

According to the table 14.3.1 we find

Confidence limits:

Confidence interval:

Parameter values T, lying in this interval are compatible with the experimental data given in table. 14.3.2.

In a similar way, a confidence interval can be constructed for the variance.

Let produced P independent experiments on a random variable X with unknown parameters from and A, and for the variance D the unbiased estimate is obtained:

It is required to approximately build a confidence interval for the variance.

From formula (14.3.11) it can be seen that the value D represents

amount P random variables of the form . These values ​​are not

independent, since any of them includes the quantity T, dependent on everyone else. However, it can be shown that as P the distribution law of their sum is also close to normal. Almost at P= 20...30 it can already be considered normal.

Let's assume that this is so, and find the characteristics of this law: the mathematical expectation and variance. Since the score D- unbiased, then M[D] = D.

Variance Calculation D D is associated with relatively complex calculations, so we give its expression without derivation:

where c 4 - the fourth central moment of the quantity x.

To use this expression, you need to substitute in it the values ​​\u200b\u200bof 4 and D(at least approximate). Instead of D you can use the evaluation D. In principle, the fourth central moment can also be replaced by its estimate, for example, by a value of the form:

but such a replacement will give an extremely low accuracy, since in general, with a limited number of experiments, high-order moments are determined with large errors. However, in practice it often happens that the form of the distribution law of the quantity X known in advance: only its parameters are unknown. Then we can try to express u4 in terms of D.

Let us take the most common case, when the value X distributed according to the normal law. Then its fourth central moment is expressed in terms of the variance (see Chapter 6 Subsection 6.2);

and formula (14.3.12) gives or

Replacing in (14.3.14) the unknown D his assessment D, we get: whence

The moment u 4 can be expressed in terms of D also in some other cases, when the distribution of the quantity X is not normal, but its appearance is known. For example, for the law of uniform density (see Chapter 5) we have:

where (a, P) is the interval on which the law is given.

Hence,

According to the formula (14.3.12) we get: from where we find approximately

In cases where the form of the law of distribution of the value of 26 is unknown, when estimating the value of a /) it is still recommended to use the formula (14.3.16), if there are no special grounds for believing that this law is very different from the normal one (has a noticeable positive or negative kurtosis) .

If the approximate value of a /) is obtained in one way or another, then it is possible to construct a confidence interval for the variance in the same way as we built it for the mathematical expectation:

where the value depending on the given probability p is found in Table. 14.3.1.

Example 2. Find an Approximately 80% Confidence Interval for the Variance of a Random Variable X under the conditions of example 1, if it is known that the value X distributed according to a law close to normal.

Solution. The value remains the same as in Table. 14.3.1:

According to the formula (14.3.16)

According to the formula (14.3.18) we find the confidence interval:

Corresponding range of mean values standard deviation: (0,21; 0,29).

14.4. Exact methods for constructing confidence intervals for the parameters of a random variable distributed according to the normal law

In the previous subsection, we considered roughly approximate methods for constructing confidence intervals for the mean and variance. Here we give an idea of ​​the exact methods for solving the same problem. We emphasize that in order to accurately find the confidence intervals, it is absolutely necessary to know in advance the form of the law of distribution of the quantity x, whereas this is not necessary for the application of approximate methods.

Idea precise methods construction of confidence intervals is reduced to the following. Any confidence interval is found from a condition expressing the probability of fulfillment of certain inequalities, which include the estimate of interest to us A. Grade distribution law A in the general case depends on the unknown parameters of the quantity x. However, sometimes it is possible to pass in inequalities from a random variable A to some other function of observed values X p X 2, ..., X p. the distribution law of which does not depend on unknown parameters, but depends only on the number of experiments and on the form of the distribution law of the quantity x. Random variables of this kind play a large role in mathematical statistics; they have been studied in most detail for the case of a normal distribution of the quantity x.

For example, it has been proved that under a normal distribution of the quantity X random value

subject to the so-called Student's distribution law With P- 1 degrees of freedom; the density of this law has the form

where G(x) is the known gamma function:

It is also proved that the random variable

has "distribution % 2 " with P- 1 degrees of freedom (see chapter 7), the density of which is expressed by the formula

Without dwelling on the derivations of distributions (14.4.2) and (14.4.4), we will show how they can be applied when constructing confidence intervals for the parameters Ty D .

Let produced P independent experiments on a random variable x, distributed according to the normal law with unknown parameters TIO. For these parameters, estimates

It is required to construct confidence intervals for both parameters corresponding to the confidence probability p.

Let us first construct a confidence interval for the mathematical expectation. It is natural to take this interval symmetrical with respect to T; denote by s p half the length of the interval. The value of sp must be chosen so that the condition

Let's try to pass on the left side of equality (14.4.5) from a random variable T to a random variable T, distributed according to Student's law. To do this, we multiply both parts of the inequality |m-w?|

to a positive value: or, using the notation (14.4.1),

Let us find a number / p such that the value / p can be found from the condition

From formula (14.4.2) it can be seen that (1) - even function, so (14.4.8) gives

Equality (14.4.9) determines the value / p depending on p. If you have at your disposal a table of integral values

then the value / p can be found by reverse interpolation in the table. However, it is more convenient to compile a table of values ​​/ p in advance. Such a table is given in the Appendix (Table 5). This table shows the values ​​depending on the confidence probability p and the number of degrees of freedom P- 1. Having determined / p according to the table. 5 and assuming

we find half the width of the confidence interval / p and the interval itself

Example 1. 5 independent experiments were performed on a random variable x, normally distributed with unknown parameters T and about. The results of the experiments are given in table. 14.4.1.

Table 14.4.1

Find an estimate T for the mathematical expectation and construct a 90% confidence interval / p for it (i.e., the interval corresponding to the confidence probability p \u003d 0.9).

Solution. We have:

According to table 5 of the application for P - 1 = 4 and p = 0.9 we find where

The confidence interval will be

Example 2. For the conditions of example 1 of subsection 14.3, assuming the value X normally distributed, find the exact confidence interval.

Solution. According to table 5 of the application, we find at P - 1 = 19ir =

0.8 / p = 1.328; from here

Comparing with the solution of example 1 of subsection 14.3 (e p = 0.072), we see that the discrepancy is very small. If we keep the accuracy to the second decimal place, then the confidence intervals found by the exact and approximate methods are the same:

Let's move on to constructing a confidence interval for the variance. Consider the unbiased variance estimate

and express the random variable D through the value V(14.4.3) having distribution x 2 (14.4.4):

Knowing the distribution law of the quantity V, you can find the interval / (1 , in which it falls from given probability R.

distribution law k n _ x (v) the value of I 7 has the form shown in fig. 14.4.1.

Rice. 14.4.1

The question arises: how to choose the interval / p? If the distribution law of the quantity V was symmetric (like a normal law or Student's distribution), it would be natural to take the interval /p symmetric with respect to the mathematical expectation. In this case, the law k n _ x (v) asymmetrical. Let us agree to choose the interval /p so that the probabilities of output of the quantity V outside the interval to the right and left (shaded areas in Fig. 14.4.1) were the same and equal

To construct an interval / p with this property, we use Table. 4 applications: it contains numbers y) such that

for the quantity V, having x 2 -distribution with r degrees of freedom. In our case r = n- 1. Fix r = n- 1 and find in the corresponding line of the table. 4 two values x 2 - one corresponding to a probability the other - probabilities Let us designate these

values at 2 And xl? The interval has y 2 , with his left, and y~ right end.

Now we find the required confidence interval /| for the variance with boundaries D, and D2, which covers the point D with probability p:

Let us construct such an interval / (, = (?> b A), which covers the point D if and only if the value V falls into the interval / r. Let us show that the interval

satisfies this condition. Indeed, the inequalities are equivalent to the inequalities

and these inequalities hold with probability p. Thus, the confidence interval for the dispersion is found and is expressed by the formula (14.4.13).

Example 3. Find the confidence interval for the variance under the conditions of example 2 of subsection 14.3, if it is known that the value X distributed normally.

Solution. We have . According to table 4 of the application

we find at r = n - 1 = 19

According to the formula (14.4.13) we find the confidence interval for the dispersion

Corresponding interval for standard deviation: (0.21; 0.32). This interval only slightly exceeds the interval (0.21; 0.29) obtained in Example 2 of Subsection 14.3 by the approximate method.

  • Figure 14.3.1 considers a confidence interval that is symmetric about a. In general, as we will see later, this is not necessary.

Estimation of confidence intervals

Learning objectives

The statistics consider the following two main tasks:

    We have some estimate based on sample data and we want to make some probabilistic statement about where the true value of the parameter being estimated is.

    We have a specific hypothesis that needs to be tested based on sample data.

In this topic, we consider the first problem. We also introduce the definition of a confidence interval.

A confidence interval is an interval that is built around the estimated value of a parameter and shows where the true value of the estimated parameter lies with an a priori given probability.

After studying the material on this topic, you:

    learn what is the confidence interval of the estimate;

    learn to classify statistical problems;

    master the technique of constructing confidence intervals, both using statistical formulas and using software tools;

    learn to identify required dimensions samples to achieve certain parameters accuracy of statistical estimates.

Distributions of sample characteristics

T-distribution

As discussed above, the distribution of the random variable is close to a standardized normal distribution with parameters 0 and 1. Since we do not know the value of σ, we replace it with some estimate s . The quantity already has a different distribution, namely, or Student's distribution, which is determined by the parameter n -1 (number of degrees of freedom). This distribution is close to the normal distribution (the larger n, the closer the distributions).

On fig. 95
Student's distribution with 30 degrees of freedom is presented. As you can see, it is very close to the normal distribution.

Similar to the functions for working with the normal distribution NORMDIST and NORMINV, there are functions for working with the t-distribution - STUDIST (TDIST) and STUDRASPBR (TINV). An example of the use of these functions can be found in the STUDRIST.XLS file (template and solution) and in fig. 96
.

Distributions of other characteristics

As we already know, to determine the accuracy of the expectation estimate, we need a t-distribution. To estimate other parameters, such as variance, other distributions are required. Two of them are the F-distribution and x 2 -distribution.

Confidence interval for the mean

Confidence interval is an interval that is built around the estimated value of the parameter and shows where the true value of the estimated parameter lies with a priori given probability.

The construction of a confidence interval for the mean value occurs in the following way:

Example

The fast food restaurant plans to expand its assortment with a new type of sandwich. In order to estimate the demand for it, the manager plans to randomly select 40 visitors from among those who have already tried it and ask them to rate their attitude towards the new product on a scale from 1 to 10. The manager wants to estimate the expected number of points that the new product will receive and construct a 95% confidence interval for this estimate. How to do it? (see file SANDWICH1.XLS (template and solution).

Solution

To solve this problem, you can use . The results are presented in fig. 97
.

Confidence interval for the total value

Sometimes, according to sample data, it is required to estimate not the mathematical expectation, but the total sum of values. For example, in a situation with an auditor, it may be of interest to estimate not the average value of an invoice, but the sum of all invoices.

Let N be the total number of elements, n be the sample size, T 3 be the sum of the values ​​in the sample, T" be the estimate for the sum over the entire population, then , and the confidence interval is calculated by the formula , where s is the estimate of the standard deviation for the sample, is the estimate of the mean for the sample.

Example

Suppose some tax office wants to estimate the size of the total tax refunds for 10,000 taxpayers. The taxpayer either receives a refund or pays additional taxes. Find the 95% confidence interval for the refund amount, assuming a sample size of 500 people (see file REFUND AMOUNT.XLS (template and solution).

Solution

There is no special procedure in StatPro for this case, however, you can see that the bounds can be obtained from the bounds for the mean using the above formulas (Fig. 98
).

Confidence interval for proportion

Let p be the expectation of a share of customers, and pv be an estimate of this share, obtained from a sample of size n. It can be shown that for sufficiently large the estimate distribution will be close to normal with mean p and standard deviation . The standard error of the estimate in this case is expressed as , and the confidence interval as .

Example

The fast food restaurant plans to expand its assortment with a new type of sandwich. In order to estimate the demand for it, the manager randomly selected 40 visitors from among those who had already tried it and asked them to rate their attitude towards the new product on a scale from 1 to 10. The manager wants to estimate the expected proportion of customers who rate the new product at least than 6 points (he expects these customers to be the consumers of the new product).

Solution

Initially, we create a new column on the basis of 1 if the client's score was more than 6 points and 0 otherwise (see the SANDWICH2.XLS file (template and solution).

Method 1

Counting the amount of 1, we estimate the share, and then we use the formulas.

The value of z cr is taken from special normal distribution tables (for example, 1.96 for a 95% confidence interval).

Using this approach and specific data to construct a 95% interval, we obtain the following results (Fig. 99
). critical value parameter z cr is equal to 1.96. The standard error of the estimate is 0.077. The lower limit of the confidence interval is 0.475. The upper limit of the confidence interval is 0.775. Thus, a manager can assume with 95% certainty that the percentage of customers who rate a new product 6 points or more will be between 47.5 and 77.5.

Method 2

This problem can be solved using standard StatPro tools. To do this, it suffices to note that the share in this case coincides with the average value of the Type column. Next apply StatPro/Statistical Inference/One-Sample Analysis to build a confidence interval for the mean value (expectation estimate) for the Type column. The results obtained in this case will be very close to the result of the 1st method (Fig. 99).

Confidence interval for standard deviation

s is used as an estimate of the standard deviation (the formula is given in Section 1). The density function of the estimate s is the chi-squared function, which, like the t-distribution, has n-1 degrees of freedom. There are special functions for working with this distribution CHI2DIST (CHIDIST) and CHI2OBR (CHIINV) .

The confidence interval in this case will no longer be symmetrical. Conditional Schema borders is shown in fig. 100 .

Example

The machine should produce parts with a diameter of 10 cm. However, due to various circumstances, errors occur. The quality controller is concerned about two things: first, the average value should be 10 cm; secondly, even in this case, if the deviations are large, then many details will be rejected. Every day he makes a sample of 50 parts (see file QUALITY CONTROL.XLS (template and solution). What conclusions can such a sample give?

Solution

We construct 95% confidence intervals for the mean and for the standard deviation using StatPro/Statistical Inference/ One-Sample Analysis(Fig. 101
).

Further, using the assumption of a normal distribution of diameters, we calculate the proportion of defective products, setting a maximum deviation of 0.065. Using the capabilities of the lookup table (the case of two parameters), we construct the dependence of the percentage of rejects on the mean value and standard deviation (Fig. 102
).

Confidence interval for the difference of two means

This is one of the most important applications of statistical methods. Situation examples.

    A clothing store manager would like to know how much more or less the average female shopper spends in the store than a male.

    The two airlines fly similar routes. A consumer organization would like to compare the difference between the average expected flight delay times for both airlines.

    The company sends out coupons for certain types goods in one city and does not send out in another. Managers want to compare the average purchases of these items over the next two months.

    A car dealer often deals with married couples at presentations. To understand their personal reactions to the presentation, couples are often interviewed separately. The manager wants to evaluate the difference in ratings given by men and women.

Case of independent samples

The mean difference will have a t-distribution with n 1 + n 2 - 2 degrees of freedom. The confidence interval for μ 1 - μ 2 is expressed by the ratio:

This problem can be solved not only by the above formulas, but also by standard StatPro tools. To do this, it is enough to apply

Confidence interval for difference between proportions

Let be the mathematical expectation of the shares. Let be their sample estimates built on samples of size n 1 and n 2, respectively. Then is an estimate for the difference . Therefore, the confidence interval for this difference is expressed as:

Here z cr is the value obtained from the normal distribution of special tables (for example, 1.96 for 95% confidence interval).

The standard error of the estimate is expressed in this case by the relation:

.

Example

The store, in preparation for the big sale, undertook the following marketing research. 300 have been chosen best buyers, who in turn were randomly divided into two groups of 150 members each. All of the selected buyers were sent invitations to participate in the sale, but only for members of the first group was attached a coupon giving the right to a 5% discount. During the sale, the purchases of all 300 selected buyers were recorded. How can a manager interpret the results and make a judgment about the effectiveness of couponing? (See COUPONS.XLS file (template and solution)).

Solution

For our particular case, out of 150 customers who received a discount coupon, 55 made a purchase on sale, and among 150 who did not receive a coupon, only 35 made a purchase (Fig. 103
). Then the values ​​of the sample proportions are 0.3667 and 0.2333, respectively. And the sample difference between them is equal to 0.1333, respectively. Assuming a confidence interval of 95%, we find from the normal distribution table z cr = 1.96. The calculation of the standard error of the sample difference is 0.0524. Finally, we get that the lower limit of the 95% confidence interval is 0.0307, ​​and the upper limit is 0.2359, respectively. The results obtained can be interpreted in such a way that for every 100 customers who received a discount coupon, we can expect from 3 to 23 new customers. However, it should be kept in mind that this conclusion in itself does not mean the efficiency of using coupons (because by providing a discount, we lose in profit!). Let's demonstrate this on concrete data. Suppose that the average purchase amount is 400 rubles, of which 50 rubles. there is a store profit. Then the expected profit per 100 customers who did not receive a coupon is equal to:

50 0.2333 100 \u003d 1166.50 rubles.

Similar calculations for 100 buyers who received a coupon give:

30 0.3667 100 \u003d 1100.10 rubles.

The decrease in the average profit to 30 is explained by the fact that, using the discount, buyers who received a coupon will, on average, make a purchase for 380 rubles.

Thus, the final conclusion indicates the inefficiency of using such coupons in this particular situation.

Comment. This problem can be solved using standard StatPro tools. To do this, it suffices to reduce this problem to the problem of estimating the difference of two averages by the method, and then apply StatPro/Statistical Inference/Two-Sample Analysis to build a confidence interval for the difference between two mean values.

Confidence interval control

The length of the confidence interval depends on following conditions:

    directly data (standard deviation);

    significance level;

    sample size.

Sample size for estimating the mean

Let us first consider the problem in the general case. Let us denote the value of half the length of the confidence interval given to us as B (Fig. 104
). We know that the confidence interval for the mean value of some random variable X is expressed as , Where . Assuming:

and expressing n , we get .

Unfortunately, exact value we do not know the variance of the random variable X. In addition, we do not know the value of t cr as it depends on n through the number of degrees of freedom. In this situation, we can do the following. Instead of the variance s, we use some estimate of the variance for some available realizations of the random variable under study. Instead of the t cr value, we use the z cr value for the normal distribution. This is quite acceptable, since the density functions for the normal and t-distributions are very close (except for the case of small n ). Thus, the desired formula takes the form:

.

Since the formula gives, generally speaking, non-integer results, rounding with an excess of the result is taken as the desired sample size.

Example

The fast food restaurant plans to expand its assortment with a new type of sandwich. In order to estimate the demand for it, the manager randomly plans to select a number of visitors from among those who have already tried it, and ask them to rate their attitude towards the new product on a scale from 1 to 10. The manager wants to estimate the expected number of points that the new product will receive. product and plot the 95% confidence interval of that estimate. However, he wants half the width of the confidence interval not to exceed 0.3. How many visitors does he need to poll?

as follows:

Here r ots is an estimate of the fraction p, and B is a given half of the length of the confidence interval. An inflated value for n can be obtained using the value r ots= 0.5. In this case, the length of the confidence interval will not exceed the given value B for any true value of p.

Example

Let the manager from the previous example plan to estimate the proportion of customers who prefer a new type of product. He wants to construct a 90% confidence interval whose half length is less than or equal to 0.05. How many clients should be randomly sampled?

Solution

In our case, the value of z cr = 1.645. Therefore, the required quantity is calculated as .

If the manager had reason to believe that the desired value of p is, for example, about 0.3, then by substituting this value in the above formula, we would get a smaller value of the random sample, namely 228.

Formula to determine random sample sizes in case of difference between two means written as:

.

Example

Some computer company has a customer service center. IN Lately the number of customer complaints about poor service quality has increased. IN service center Basically, two types of employees work: those with little experience, but who have completed special preparatory courses, and those with extensive practical experience, but who have not completed special courses. The company wants to analyze customer complaints over the past six months and compare their average numbers per each of the two groups of employees. It is assumed that the numbers in the samples for both groups will be the same. How many employees must be included in the sample to get a 95% interval with a half length of no more than 2?

Solution

Here σ ots is an estimate of the standard deviation of both random variables under the assumption that they are close. Thus, in our task, we need to somehow obtain this estimate. This can be done, for example, as follows. Looking at customer complaint data over the past six months, a manager may notice that there are generally between 6 and 36 complaints per employee. Knowing that for a normal distribution, almost all values ​​are removed from the mean by no more than three standard deviations, he may reasonably believe that:

, whence σ ots = 5.

Substituting this value into the formula, we get .

Formula to determine the size of a random sample in the case of estimating the difference between the shares looks like:

Example

Some company has two factories for the production of similar products. A company manager wants to compare the defect rates of both factories. According to available information, the rejection rate at both factories is from 3 to 5%. It is supposed to build a 99% confidence interval with a half length of no more than 0.005 (or 0.5%). How many products should be selected from each factory?

Solution

Here p 1ot and p 2ot are estimates of two unknown fractions of rejects at the 1st and 2nd factories. If we put p 1ots \u003d p 2ots \u003d 0.5, then we will get an overestimated value for n. But since in our case we have some a priori information about these shares, we take the upper estimate of these shares, namely 0.05. We get

When some population parameters are estimated from sample data, it is useful to provide not only a point estimate of the parameter, but also a confidence interval that shows where the exact value of the parameter being estimated may lie.

In this chapter, we also got acquainted with quantitative relationships that allow us to build such intervals for various parameters; learned ways to control the length of the confidence interval.

We also note that the problem of estimating the sample size (experiment planning problem) can be solved using standard StatPro tools, namely StatPro/Statistical Inference/Sample Size Selection.

Confidence interval

Confidence interval- a term used in mathematical statistics for interval (as opposed to point) estimation of statistical parameters, which is preferable with a small sample size. The confidence interval is the interval that covers the unknown parameter with a given reliability.

The method of confidence intervals was developed by the American statistician Jerzy Neumann, based on the ideas of the English statistician Ronald Fischer.

Definition

Confidence interval parameter θ random variable distribution X with trust level 100 p%, generated by the sample ( x 1 ,…,x n), is called an interval with boundaries ( x 1 ,…,x n) and ( x 1 ,…,x n) which are realizations of random variables L(X 1 ,…,X n) and U(X 1 ,…,X n) such that

.

The boundary points of the confidence interval are called confidence limits.

An intuition-based interpretation of the confidence interval would be: if p is large (say 0.95 or 0.99), then the confidence interval almost certainly contains the true value θ .

Another interpretation of the concept of a confidence interval: it can be considered as an interval of parameter values θ compatible with experimental data and not contradicting them.

Examples

  • Confidence interval for the mathematical expectation of a normal sample ;
  • Confidence interval for the normal sample variance .

Bayesian Confidence Interval

In Bayesian statistics, there is a definition of a confidence interval that is similar but differs in some key details. Here, the estimated parameter itself is considered a random variable with some given a priori distribution (uniform in the simplest case), and the sample is fixed (in classical statistics it's exactly the opposite). The Bayesian-confidence interval is the interval covering the parameter value with the posterior probability:

.

Generally, classical and Bayesian confidence intervals are different. In the English-language literature, the Bayesian confidence interval is usually called the term credible interval, and the classic confidence interval.

Notes

Sources

Wikimedia Foundation. 2010 .

  • Baby (film)
  • Colonist

See what "Confidence Interval" is in other dictionaries:

    Confidence interval- the interval calculated from the sample data, which with a given probability (confidence) covers the unknown true value of the estimated distribution parameter. Source: GOST 20522 96: Soils. Methods of statistical processing of results ... Dictionary-reference book of terms of normative and technical documentation

    confidence interval- for a scalar parameter of the general population, this is a segment that most likely contains this parameter. This phrase is meaningless without further clarification. Since the boundaries of the confidence interval are estimated from the sample, it is natural to ... ... Dictionary of Sociological Statistics

    CONFIDENCE INTERVAL is a parameter estimation method that differs from point estimation. Let a sample x1, . be given. . ., xn from a distribution with a probability density f(x, α), and a*=a*(x1, . . ., xn) is the estimate α, g(a*, α) is the probability density of the estimate. Are looking for… … Geological Encyclopedia

    CONFIDENCE INTERVAL- (confidence interval) The interval in which the confidence of a parameter value for a population derived from a sample survey has a certain degree of probability, such as 95%, due to the sample itself. Width… … Economic dictionary

    confidence interval- is the interval in which the true value of the determined quantity is located with a given confidence probability. General chemistry: textbook / A. V. Zholnin ... Chemical terms

    Confidence interval CI- Confidence interval, CI * davyaralny interval, CI * confidence interval interval of the sign value, calculated for c.l. distribution parameter (e.g. the mean value of a feature) over the sample and with a certain probability (e.g. 95% for 95% ... Genetics. encyclopedic Dictionary

    CONFIDENCE INTERVAL- the concept that arises when estimating the parameter statistich. distribution by interval of values. D. i. for the parameter q corresponding to the given coefficient. confidence P, is equal to such an interval (q1, q2) that for any distribution of the probability of inequality ... ... Physical Encyclopedia

    confidence interval- - Telecommunication topics, basic concepts EN confidence interval ... Technical Translator's Handbook

    confidence interval- pasikliovimo intervalas statusas T sritis Standartizacija ir metrologija apibrėžtis Dydžio verčių intervalas, kuriame su pasirinktąja tikimybe yra matavimo rezultato vertė. atitikmenys: engl. confidence interval vok. Vertrauensbereich, m rus.… … Penkiakalbis aiskinamasis metrologijos terminų žodynas

    confidence interval- pasikliovimo intervalas statusas T sritis chemija apibrėžtis Dydžio verčių intervalas, kuriame su pasirinktąja tikimybe yra matavimo rezultatų vertė. atitikmenys: engl. confidence interval rus. trust area; confidence interval... Chemijos terminų aiskinamasis žodynas