Construct an interval variation series. Interval variation series

Send your good work in the knowledge base is simple. Use the form below

Good work to the site ">

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru/

TASK1

There is the following data on wages employees at the enterprise:

Table 1.1

The amount of wages in conv. den. units

It is required to build interval series the distribution by which to find;

1) average wages;

2) the average linear deviation;

4) standard deviation;

5) the range of variation;

6) oscillation coefficient;

7) linear coefficient variations;

8) simple coefficient of variation;

10) median;

11) the coefficient of asymmetry;

12) Pearson asymmetry index;

13) coefficient of kurtosis.

Solution

As you know, options (values ​​recognized) are arranged in ascending order form discrete variation series. With a large number variant (more than 10), even in the case of discrete variation, interval series are built.

If an interval series is compiled at even intervals, then the range of variation is divided by the specified number of intervals. Moreover, if the obtained value is integer and unambiguous (which is rare), then the length of the interval is assumed to be equal to this number. In other cases produced rounding necessarily v side increase, So to the last digit left was even. Obviously, with an increase in the length of the interval, the the range of variation by the value, equal to the product number of intervals: by the difference between the calculated and initial length of the interval

a) If the magnitude of the expansion of the variation range is insignificant, then it is either added to the largest or subtracted from the smallest value of the feature;

b) If the magnitude of the expansion of the range of variation is palpable, then, so that the center of the range does not mix, it is approximately halved by simultaneously adding to the largest and subtracting from the smallest values ​​of the feature.

If an interval series is compiled with not at equal intervals, then the process is simplified, but the length of the intervals should still be expressed as a number with the last even digit, which greatly simplifies subsequent calculations numerical characteristics.

30 - sample size.

Let's compose an interval distribution series using the Sturges formula:

K = 1 + 3.32 * log n,

K is the number of groups;

K = 1 + 3.32 * log 30 = 5.91 = 6

We find the range of the attribute - the wages of workers at the enterprise - (x) according to the formula

R = xmax - xmin and divide by 6; R = 195-112 = 83

Then the length of the interval will be l lane = 83: 6 = 13.83

The start of the first interval will be 112. Adding to 112 l races = 13.83, we get its final value 125.83, which is at the same time the beginning of the second interval, etc. end of the fifth interval - 195.

When finding frequencies, one should be guided by the rule: "if the value of a feature coincides with the boundary of an internal interval, then it should be referred to the previous interval."

We get an interval series of frequencies and storage frequencies.

Table 1.2

Consequently, 3 workers have a salary. fee from 112 to 125.83 conventional units The largest charge. payment from 181.15 to 195 conventional monetary units only 6 employees.

To calculate the numerical characteristics, we transform the interval series into a discrete one, taking as a variant the middle of the intervals:

Table 1.3

14131,83

According to the formula of the weighted arithmetic mean

cond.den. units

Average linear deviation:

where xi is the value of the trait under study in the i-th unit of the population,

The average value of the trait under study.

Posted on http://www.allbest.ru/

L Posted on http://www.allbest.ru/

Service den.ed.

Standard deviation:

Dispersion:

Relative swing (oscillation coefficient): c = R :,

Relative linear deviation: q = L:

The coefficient of variation: V = y:

The oscillation coefficient shows the relative fluctuations of the extreme values ​​of the attribute around the arithmetic mean, and the coefficient of variation characterizes the degree and homogeneity of the population.

c = R: = 83 / 159.485 * 100% = 52.043%

Thus, the difference between the extreme values ​​is 5.16% (= 94.84% -100%) less than the average salary of workers at the enterprise.

q = L: = 17.765 / 159.485 * 100% = 11.139%

V = y: = 21.704 / 159.485 * 100% = 13.609%

The coefficient of variation is less than 33%, which indicates a weak variation in the wages of employees at the enterprise, i.e. that the average value is a typical characteristic of the wages of workers (a homogeneous set).

In interval series of distribution fashion is determined by the formula -

The frequency of the modal interval, i.e., the interval containing greatest number option;

The frequency of the interval preceding the modal;

The frequency of the interval following the modal;

The length of the modal interval;

The lower bound of the modal interval.

For determining medians in the interval series, we use the formula

where is the cumulative (accumulated) frequency of the interval preceding the median;

Lower border of the median interval;

The frequency of the median interval;

The length of the median interval.

Median interval- an interval, the accumulated frequency of which (= 3 + 3 + 5 + 7) exceeds half of the sum of frequencies - (153.49; 167.32).

Let's calculate the skewness and kurtosis for which we will create a new worksheet:

Table 1.4

Factual data

Estimated data

Let's calculate the moment of the third order

Therefore, the asymmetry is

Since 0.3553 0.25, the asymmetry is considered significant.

Let's calculate the moment of the fourth order

Therefore, the kurtosis is

Because< 0, то эксцесс является плосковершинным.

The degree of skewness can be determined using the Pearson skewness coefficient (As): oscillation sample value turnover

where is the arithmetic mean of the distribution series; - fashion; - standard deviation.

With a symmetric (normal) distribution = Mo, therefore, the asymmetry coefficient is zero... If Аs> 0, then there is more mode, therefore, there is a right-sided asymmetry.

If As< 0, то меньше моды, следовательно, имеется левосторонняя асимметрия. Коэффициент асимметрии может изменяться от -3 до +3.

The distribution is not symmetrical, but has left-sided asymmetry.

TASK 2

What should be the size of the sample so that, with a probability of 0.954, the sampling error does not exceed 0.04, if it is known from previous surveys that the variance is 0.24?

Solution

The sample size for non-repeat sampling is calculated by the formula:

t is the coefficient of confidence (with a probability of 0.954 it is equal to 2.0; determined by tables of probability integrals),

y2 = 0.24 - standard deviation;

10,000 people - the size of the sample;

Dx = 0.04 is the marginal error of the sample mean.

With a probability of 95.4%, it can be argued that the sample size providing a relative error of no more than 0.04 should be at least 566 families.

TASK3

There is the following data on the income from the main activity of the enterprise, mln. Rubles.

To analyze a number of dynamics, determine the following indicators:

1) chain and basic:

Absolute gains;

Rates of growth;

Growth rates;

2) medium

The level of a number of dynamics;

Absolute gain;

Growth rate;

Rate of increase;

3) the absolute value of 1% increase.

Solution

1. Absolute gain (Dy) is the difference between the next level of the series and the previous (or basic):

chain: Du = yi - yi-1,

basic: Ду = уi - y0,

уi - row level,

i - row level number,

y0 is the level of the base year.

2. Growth rate (Tu) is the ratio of the next level of the series and the previous (or baseline year 2001):

chain: Tu =;

basic: Tu =

3. Growth rate (TD) is the ratio of the absolute growth to the previous level, expressed in%.

chain: Tu =;

basic: Tu =

4. Absolute value 1% increase (A) is the ratio of the chain absolute growth to the growth rate, expressed in%.

A =

Middle level of the row calculated by the arithmetic mean formula.

Average level of income from core activities for 4 years:

Average absolute growth calculated by the formula:

where n is the number of levels in the series.

On average, revenues from operating activities increased by RUB 3.333 million over the year.

Average annual growth rate calculated by the geometric mean formula:

уn - the final level of the series,

y0 is the initial level of the row.

Tu = 100% = 102.174%

Average annual growth rate calculated by the formula:

T? = Tu - 100% = 102.74% - 100% = 2.74%.

Thus, on average over the year, the income from the main activities of the enterprise increased by 2.74%.

TASKSA4

Calculate:

1. Individual price indices;

2. General index of turnover;

3. Aggregate price index;

4. Aggregate index of the physical volume of sales of goods;

5. The absolute increase in the value of turnover and decompose by factors (due to changes in prices and the number of goods sold);

6. Make brief conclusions on all the indicators obtained.

Solution

1. By condition, individual price indices for items A, B, C were -

ipA = 1.20; ipB = 1.15; ipB = 1.00.

2. The general index of turnover is calculated by the formula:

I w = = 1470/1045 * 100% = 140.67%

Trade turnover increased by 40.67% (140.67% -100%).

On average, commodity prices rose by 10.24%.

Sum additional costs buyers from rising prices:

w (p) =? p1q1 -? p0q1 = 1470 - 1333.478 = 136.522 million rubles.

As a result of the rise in prices, buyers had to spend an additional 136.522 million rubles.

4. General index of the physical volume of trade:

The physical volume of trade turnover increased by 27.61%.

5. Determine the overall change in turnover in the second period compared to the first period:

w = 1470-1045 = 425 million rubles.

due to price changes:

W (p) = 1470 - 1333.478 = 136.522 million rubles.

due to changes in physical volume:

w (q) = 1333.478 - 1045 = 288.478 million rubles.

The turnover of goods increased by 40.67%. Prices for 3 goods on average increased by 10.24%. The physical volume of trade increased by 27.61%.

In general, the sales volume increased by 425 million rubles, including due to the increase in prices, it increased by 136.522 million rubles, and due to the increase in sales volumes - by 288.478 million rubles.

TASK5

The following data are available for 10 plants in the same industry.

Plant no.

Production output, thousand units (NS)

Based on the data provided:

I) to confirm the provisions of the logical analysis on the presence of a linear correlation between the factor attribute (production volume) and the effective attribute (power consumption), plot the initial data on the correlation field graph and draw conclusions about the form of connection, indicate its formula;

2) determine the parameters of the equation of communication and plot the obtained theoretical line on the graph of the correlation field;

3) calculate the linear correlation coefficient,

4) explain the values ​​of the indicators obtained in paragraphs 2) and 3);

5) using the obtained model, make a forecast about the possible consumption of electricity at a plant with a production volume of 4.5 thousand units.

Solution

Feature data - the volume of output (factor), we denote by xi; sign - power consumption (result) through уi; points with coordinates (x, y) are applied to the OXY correlation field.

The points of the correlation field are located along some straight line. Therefore, the connection is linear, we will look for the regression equation in the form of a straight line Yx = ax + b. To find it, we will use the system of normal equations:

Let's make a calculation table.

Using the found mean, we compose the system and solve it with respect to the parameters a and b:

So, we get the regression equation for y on x: = 3.57692 x + 3.19231

We build a regression line on the correlation field.

Substituting the values ​​of x from column 2 into the regression equation, we obtain the calculated (column 7) and compare them with the data for y, which is reflected in column 8. By the way, the correctness of the calculations is also confirmed by the coincidence of the mean values ​​of y and.

Coefficientlinear correlation assesses the closeness of the relationship between the signs x and y and is calculated by the formula

The slope of the regression line a (at x) characterizes the direction of the revealeddependenciessigns: for a> 0 are the same, for a<0- противоположны. Its absolute value - a measure of the change in the effective attribute when the factor changes per unit of measurement.

The free term of the regression line reveals the direction, and its absolute value is a quantitative measure of the influence on the effective sign of all other factors.

If< 0, then the resource of the factor attribute of an individual object is used with a smaller one, and when>0 withgreater efficiency than the average for the entire set of objects.

Let's carry out a post-regression analysis.

The coefficient at x of the regression line is 3.57692> 0, therefore, with an increase (decrease) in production, the consumption of electricity increases (decreases). Increase in production output by 1 thousand units. gives an average increase in electricity consumption by 3.57692 thousand kWh.

2. The free term of the direct regression is 3.19231, therefore, the influence of other factors increases the strength of the impact of output on electricity consumption in absolute terms by 3.19231 thousand kWh.

3. Correlation coefficient 0.8235 reveals a very close dependence of power consumption on output.

It is easy to make predictions using the equation of a regression model. To do this, the values ​​of x are substituted into the regression equation - the volume of production and electricity consumption is predicted. In this case, the values ​​of x can be taken not only within the specified range, but also outside it.

Let's make a forecast about the possible consumption of electricity at a plant with a production volume of 4.5 thousand units.

3.57692 * 4.5 + 3.19231 = 19.288 45 thousand kWh.

LIST OF USED SOURCES

1. Zakharenkov S.N. Socio-economic statistics: Textbook-practical guide. -Mn .: BSEU, 2002.

2. Efimova M.R., Petrova E.V., Rumyantsev V.N. General theory of statistics. - M .: INFRA - M., 2000.

3. Eliseeva I.I. Statistics. - M .: Prospect, 2002.

4. General theory of statistics / Under total. ed. O.E. Bashina, A.A. Spirina. - M .: Finance and statistics, 2000.

5. Socio-economic statistics: Textbook-practical. allowance / Zakharenkov S.N. et al. - Minsk: YSU, 2004.

6. Socio-economic statistics: Textbook. allowance. / Ed. Nesterovich S.R. - Minsk: BSEU, 2003.

7. Teslyuk I.E., Tarlovskaya V.A., Terlizhenko N. Statistics.- Minsk, 2000.

8. Kharchenko L.P. Statistics. - M .: INFRA - M, 2002.

9. Kharchenko L.P., Dolzhenkova V.G., Ionin V.G. Statistics. - M .: INFRA - M, 1999.

10. Economic statistics / Ed. Yu.N. Ivanova - M., 2000.

Posted on Allbest.ru

...

Similar documents

    Calculation of the arithmetic mean for the interval distribution series. Determination of the general index of the physical volume of trade. Analysis of the absolute change in the total cost of products due to changes in physical volume. Calculation of the coefficient of variation.

    test, added 07/19/2010

    The essence of wholesale, retail and public goods turnover. Formulas for calculating individual, aggregate indices of turnover. Calculation of the characteristics of the interval distribution series - arithmetic mean, mode and median, coefficient of variation.

    term paper added on 05/10/2013

    Calculation of the planned and actual sales, the percentage of the plan, the absolute change in turnover. Determination of absolute growth, average growth rates and growth cash income... Calculation of structural means: modes, medians, quartiles.

    test, added 02/24/2012

    Interval series of banks' distribution by profit volume. Finding the mode and median of the obtained interval distribution series by a graphical method and by calculations. Calculation of the characteristics of the interval distribution series. Calculation of the arithmetic mean.

    test, added 12/15/2010

    Formulas for determining the mean values ​​of the interval series - mode, median, variance. Calculation of analytical indicators of the series of dynamics according to the chain and basic schemes, growth and growth rates. The concept of the consolidated index of prime cost, prices, costs and turnover.

    term paper, added 02/27/2011

    Concept and purpose, order and rules of construction variation series... Analysis of the homogeneity of data in groups. Indicators of variation (variability) of the trait. Determination of the mean linear and square deviation, the coefficient of oscillation and variation.

    test, added 04/26/2010

    The concept of fashion and median as typical characteristics, the order and criteria for their determination. Finding the mode and median in a discrete and interval variation series. Quartiles and deciles as additional characteristics of the statistical variation series.

    test, added 09/11/2010

    Construction of an interval distribution series based on a grouping criterion. The characteristic of the deviation of the frequency distribution from symmetrical shape, calculation of indicators of kurtosis and asymmetry. Analysis of indicators of the balance sheet or income statement.

    test, added 10/19/2014

    Convert empirical series to discrete and interval. Determination of the average value for a discrete series using its properties. Calculation for a discrete series of mode, median, indicators of variation (variance, deviation, oscillation coefficient).

    test, added 04/17/2011

    Construction of a statistical series of the distribution of organizations. Graphical definition of the value of the mode and median. The tightness of the correlation using the coefficient of determination. Determination of the sampling error of the average number of employees.

Condition:

There is data on the age composition of workers (years): 18, 38, 28, 29, 26, 38, 34, 22, 28, 30, 22, 23, 35, 33, 27, 24, 30, 32, 28, 25, 29, 26, 31, 24, 29, 27, 32, 25, 29, 29.

    1. Construct an interval distribution series.
    2. Build graphic image row.
    3. Define fashion and median graphically.

Solution:

1) According to the Sturgess formula, the population should be divided into 1 + 3.322 lg 30 = 6 groups.

The maximum age is 38, the minimum is 18.

Bin Width Since the ends of the bins must be integers, we divide the population into 5 groups. The interval width is 4.

To facilitate calculations, we will arrange the data in ascending order: 18, 22, 22, 23, 24, 24, 25, 25, 26, 26, 27, 27, 28, 28, 28, 29, 29, 29, 29, 29, 30 , 30, 31, 32, 32, 33, 34, 35, 38, 38.

Age distribution of workers

Graphically, a series can be displayed as a histogram or polygon. A histogram is a bar chart. The base of the column is the width of the interval. The height of the bar is equal to the frequency.

Polygon (or distribution polygon) is a graph of frequencies. To build it on the histogram, connect the midpoints of the upper sides of the rectangles. We close the polygon on the Ox axis at distances equal to half the interval from the extreme values ​​of x.

Mode (Mo) is the value of the trait under study, which occurs most often in a given set.

To determine the mode from the histogram, you need to select the highest rectangle, draw a line from the right vertex of this rectangle to the upper right corner of the previous rectangle, and from the left vertex of the modal rectangle draw a line to the left vertex of the next rectangle. From the point of intersection of these lines, draw a perpendicular to the x-axis. The abscissa will be the fashion. Mo ≈ 27.5. This means that the most common age in this population is 27-28 years.

The median (Me) is the value of the trait under study, which is in the middle of the ordered variation series.

We find the median by cumulative. Cumulata - a graph of accumulated frequencies. Abscissas are row variants. The ordinates are the accumulated frequencies.

To determine the median from the cumulative, we find on the ordinate a point corresponding to 50% of the accumulated frequencies (in our case, 15), draw a straight line through it, parallel to the Ox axis, and from the point of its intersection with the cumulative we draw a perpendicular to the x axis. The abscissa is the median. Me ≈ 25.9. This means that half of the workers in this population are less than 26 years old.

When processing large amounts of information, which is especially important when carrying out modern scientific developments, the researcher is faced with the serious task of correctly grouping the initial data. If the data are discrete, then problems, as we have seen, do not arise - you just need to calculate the frequency of each feature. If the investigated feature has continuous character (which is more widespread in practice), then the choice of the optimal number of intervals for grouping a feature is by no means a trivial task.

To group continuous random variables, the entire variation range the attribute is split into a certain number of intervals To.

Grouped by interval (continuous) variation series the intervals (), ranked by the value of the feature, are called, where the numbers of observations that fall into the r "-th interval, indicated together with the corresponding frequencies (), or relative frequencies ():

Characteristic value intervals

Frequency mi

bar graph and cumulate (ogiva), already discussed in detail by us, are an excellent data visualization tool that allows you to get a primary idea of ​​the data structure. Such graphs (Fig. 1.15) are constructed for continuous data in the same way as for discrete data, only taking into account the fact that continuous data completely fill the area of ​​their possible values, taking any values.

Rice. 1.15.

That's why the columns on the histogram and cumulative must touch each other, do not have areas where the values ​​of the characteristic do not fall within the limits of all possible(ie, the histogram and cumulative should not have "holes" along the abscissa, which do not include the values ​​of the studied variable, as in Fig. 1.16). The height of the bar corresponds to the frequency - the number of observations that fell into given interval, or relative frequency - the proportion of observations. Intervals should not intersect and are generally of the same width.

Rice. 1.16.

The histogram and polygon are approximations of the probability density curve (differential function) f (x) theoretical distribution, considered in the course of probability theory. Therefore, their construction is so important in the primary statistical processing of quantitative continuous data - by their appearance, one can judge the hypothetical distribution law.

Cumulative - the curve of the accumulated frequencies (frequencies) of the interval variation series. The cumulative is compared to the graph of the cumulative distribution function F (x), also considered in the course of probability theory.

Basically, the concepts of histograms and cumulates are associated with continuous data and their interval variation series, since their graphs are empirical estimates of the probability density function and distribution function, respectively.

The construction of an interval variation series begins with determining the number of intervals k. And this task, perhaps, is the most difficult, important and controversial in the issue under study.

The number of intervals should not be too small, since in this case the histogram turns out to be too smoothed ( oversmoothed), loses all the features of the variability of the initial data - in Fig. 1.17 you can see how the same data on which the graphs in Fig. 1.15, are used to build a histogram with a smaller number of intervals (left graph).

At the same time, the number of intervals should not be too large - otherwise we will not be able to estimate the distribution density of the studied data along the number axis: the histogram will turn out to be undersmooth (undersmoothed), with unfilled intervals, uneven (see Fig. 1.17, right graph).

Rice. 1.17.

How do you determine the most preferred number of intervals?

Back in 1926, Herbert Sturges proposed a formula for calculating the number of intervals into which it is necessary to split the original set of values ​​of the trait under study. This formula has truly become super popular - most statistical textbooks offer it, and many statistical packages use it by default. To what extent this is justified and in all cases is a very serious question.

So what is the Sturges formula based on?

Consider the binomial distribution)