Spearman's rank correlation is calculated using the formula. Application of the Spearman and Pearson correlation

The Pearson correlation is a measure of the linear relationship between two variables. It allows you to determine how proportional the variability of two variables is. If the variables are proportional to each other, then graphically the relationship between them can be represented as a straight line with a positive (direct proportion) or negative (inverse proportion) slope.

In practice, the relationship between two variables, if any, is probabilistic and graphically looks like an ellipsoidal scatter cloud. This ellipsoid, however, can be represented (approximated) as a straight line, or a regression line. The regression line is a straight line constructed by the method least squares: the sum of the squared distances (calculated along the y-axis) from each point of the scatter plot to the straight line is the minimum

Of particular importance for assessing the accuracy of the prediction is the variance of estimates of the dependent variable. In essence, the variance of estimates of the dependent variable Y is that part of its total variance that is due to the influence of the independent variable X. In other words, the ratio of the variance of estimates of the dependent variable to its true variance is equal to the square of the correlation coefficient.

The square of the correlation coefficient of the dependent and independent variables represents the proportion of the variance of the dependent variable due to the influence of the independent variable, and is called the coefficient of determination. The coefficient of determination, therefore, shows the extent to which the variability of one variable is due (determined) by the influence of another variable.

The determination coefficient has an important advantage over the correlation coefficient. The correlation __________ is not a linear function of the relationship between two variables. Therefore, the arithmetic mean of the correlation coefficients for several samples does not coincide with the correlation calculated immediately for all subjects from these samples (i.e., the correlation coefficient is not additive). On the contrary, the coefficient of determination reflects the relationship linearly and, therefore, is additive: it can be averaged over several samples.

Additional information about the strength of the connection is given by the value of the correlation coefficient squared - the coefficient of determination: this is the part of the variance of one variable that can be explained by the influence of another variable. In contrast to the correlation coefficient, the coefficient of determination increases linearly with an increase in the strength of the connection.

Spearman and τ-Kendall correlation coefficients (rank correlations)

If both variables between which the relationship is being studied are presented on an ordinal scale, or one of them is on an ordinal scale and the other is on a metric scale, then rank correlation coefficients are applied: Spearman or τ-Kendell. Both coefficients require prior ranking of both variables for their application.

Coefficient rank correlation Spearman is a non-parametric method that is used to statistically study the relationship between phenomena. In this case, the actual degree of parallelism between the two quantitative series of the studied features is determined and an estimate of the tightness is given established connection using a quantified coefficient.

If the members of a group were ranked first by the x variable and then by the y variable, then the correlation between the x and y variables can be obtained by simply calculating the Pearson coefficient for the two rank series. Provided there are no links in the ranks (i.e., no repeated ranks) for either variable, the formula for Pearson can be significantly simplified computationally and converted into the formula known as Spearman.

The power of the Spearman rank correlation coefficient is somewhat inferior to the power of the parametric correlation coefficient.

It is advisable to use the rank correlation coefficient in the presence of a small number of observations. This method can be used not only for quantified data, but also in cases where the recorded values ​​are determined by descriptive features of varying intensity.

Spearman's rank correlation coefficient with a large number of identical ranks for one or both of the compared variables gives coarsened values. Ideally, both correlated series should be two sequences of mismatched values.

An alternative to the Spearman correlation for ranks is the τ-Kendall correlation. The correlation proposed by M. Kendall is based on the idea that the direction of the connection can be judged by comparing the subjects in pairs: if a pair of subjects has a change in x that coincides in direction with a change in y, then this indicates a positive relationship, if does not match - something about a negative relationship.

Pearson correlation coefficient

Coefficient r- Pearson is used to study the relationship of two metric variables measured on the same sample. There are many situations in which it is appropriate to use it. Does intelligence affect undergraduate performance? Is the salary of an employee related to his goodwill towards colleagues? Does the mood of a student affect the success of solving a complex arithmetic problem? To answer such questions, the researcher must measure two indicators of interest to each member of the sample.

The value of the correlation coefficient is not affected by the units in which the features are presented. Therefore, any linear transformations of features (multiplication by a constant, addition of a constant) do not change the value of the correlation coefficient. An exception is the multiplication of one of the signs by a negative constant: the correlation coefficient changes its sign to the opposite.

Application of the Spearman and Pearson correlation.

The Pearson correlation is a measure of the linear relationship between two variables. It allows you to determine how proportional the variability of two variables is. If the variables are proportional to each other, then graphically the relationship between them can be represented as a straight line with a positive (direct proportion) or negative (inverse proportion) slope.

In practice, the relationship between two variables, if any, is probabilistic and graphically looks like an ellipsoidal scatter cloud. This ellipsoid, however, can be represented (approximated) as a straight line, or a regression line. The regression line is a straight line constructed using the least squares method: the sum of the squared distances (calculated along the y-axis) from each point of the scatterplot to the line is minimal.

Of particular importance for assessing the accuracy of the prediction is the variance of estimates of the dependent variable. In essence, the variance of estimates of the dependent variable Y is that part of its total variance that is due to the influence of the independent variable X. In other words, the ratio of the variance of estimates of the dependent variable to its true variance is equal to the square of the correlation coefficient.

The square of the correlation coefficient of the dependent and independent variables represents the proportion of the variance of the dependent variable due to the influence of the independent variable, and is called the coefficient of determination. The coefficient of determination, therefore, shows the extent to which the variability of one variable is due (determined) by the influence of another variable.

The determination coefficient has an important advantage over the correlation coefficient. Correlation is not a linear function of the relationship between two variables. Therefore, the arithmetic mean of the correlation coefficients for several samples does not coincide with the correlation calculated immediately for all subjects from these samples (i.e., the correlation coefficient is not additive). On the contrary, the coefficient of determination reflects the relationship linearly and, therefore, is additive: it can be averaged over several samples.

Additional information about the strength of the connection is given by the value of the correlation coefficient squared - the coefficient of determination: this is the part of the variance of one variable that can be explained by the influence of another variable. In contrast to the correlation coefficient, the coefficient of determination increases linearly with an increase in the strength of the connection.

Spearman's correlation coefficients and τ - Kendall ( rank correlations )

If both variables between which the relationship is being studied are presented on an ordinal scale, or one of them is on an ordinal scale and the other is on a metric scale, then the rank correlation coefficients are applied: Spearman or τ - Kendell. Both coefficients require prior ranking of both variables for their application.

Spearman's rank correlation coefficient is a non-parametric method that is used to statistically study the relationship between phenomena. In this case, the actual degree of parallelism between the two quantitative series of the studied features is determined and an estimate of the tightness of the established relationship is given using a quantitatively expressed coefficient.

If the members of the group were ranked first by the x variable and then by the y variable, then the correlation between the x and y variables can be obtained by simply calculating the Pearson coefficient for the two rank series. Provided there are no links in the ranks (i.e., no repeated ranks) for either variable, the formula for Pearson can be significantly simplified computationally and converted into the formula known as Spearman.

The power of the Spearman rank correlation coefficient is somewhat inferior to the power of the parametric correlation coefficient.

It is advisable to use the rank correlation coefficient in the presence of a small number of observations. This method can be used not only for quantified data, but also in cases where the recorded values ​​are determined by descriptive features of varying intensity.

Spearman's rank correlation coefficient with a large number of identical ranks for one or both of the compared variables gives coarsened values. Ideally, both correlated series should be two sequences of mismatched values

An alternative to the Spearman correlation for ranks is the correlation τ - Kendall. The correlation proposed by M. Kendall is based on the idea that the direction of the connection can be judged by comparing the subjects in pairs: if a pair of subjects has a change in x that coincides in direction with a change in y, then this indicates a positive relationship, if does not match - something about a negative relationship.

Correlation coefficients have been specifically designed to numerically determine the strength and direction of a relationship between two properties measured on numerical scales (metric or rank). As already mentioned, the correlation values ​​+1 (strict direct or directly proportional relationship) and -1 (strict inverse or inversely proportional relationship) correspond to the maximum strength of the relationship, correlation corresponds to the absence of a relationship, zero. Additional information about the strength of the connection is given by the value of the coefficient of determination: it is the part of the variance of one variable that can be explained by the influence of another variable.

9. Parametric methods for data comparison

Parametric comparison methods apply if your variables were measured on a metric scale.

Comparison of variances 2- x samples by Fisher's test .


This method allows you to test the hypothesis that the variances of 2 general populations from which the compared samples are extracted differ from each other. Limitations of the method - the distribution of the feature in both samples should not differ from normal.

An alternative to comparing variances is the Lieven test, for which there is no need to test for normal distribution. This method can be used to test the assumption of equality (homogeneity) of variances before checking the reliability of the difference in the averages by Student's t-test for independent samples of different sizes.

In cases where the measurements of the studied characteristics are carried out on an order scale, or the form of the relationship differs from linear, the study of the relationship between the two random variables carried out with the help of rank correlation coefficients. Consider Spearman's rank correlation coefficient. When calculating it, it is necessary to rank (order) the sample options. Ranking is the grouping of experimental data in a certain order, either ascending or descending.

The ranking operation is carried out according to the following algorithm:

1. A lower value is assigned a lower rank. The highest value is assigned a rank corresponding to the number of ranked values. The lowest value is assigned a rank equal to 1. For example, if n=7, then the highest value will receive rank number 7, except for the cases provided for by the second rule.

2. If several values ​​are equal, then they are assigned a rank, which is the average of those ranks that they would have received if they were not equal. As an example, consider an ascending sample consisting of 7 elements: 22, 23, 25, 25, 25, 28, 30. The values ​​22 and 23 occur once, so their ranks are respectively equal to R22=1, and R23=2 . The value 25 occurs 3 times. If these values ​​did not repeat, then their ranks would be equal to 3, 4, 5. Therefore, their rank R25 is equal to the arithmetic mean of 3, 4 and 5: . The values ​​28 and 30 do not repeat, so their ranks are respectively R28=6 and R30=7. Finally, we have the following correspondence:

3. The total amount of ranks must match the calculated one, which is determined by the formula:

where n is the total number of ranked values.

The discrepancy between the actual and calculated amounts of ranks will indicate an error made in the calculation of ranks or their summation. In this case, you need to find and fix the error.

Spearman's rank correlation coefficient is a method that allows you to determine the strength and direction of the relationship between two features or two feature hierarchies. The use of the rank correlation coefficient has a number of limitations:

  • a) The expected correlation should be monotonic.
  • b) The volume of each of the samples must be greater than or equal to 5. To determine the upper limit of the sample, tables of critical values ​​​​are used (Table 3 of the Appendix). Maximum value n in the table is 40.
  • c) During the analysis, it is likely that a large number of identical ranks will occur. In this case, an amendment needs to be made. The most favorable case is when both studied samples represent two sequences of mismatched values.

For correlation analysis The researcher must have two samples that can be ranked, for example:

  • - two signs measured in the same group of subjects;
  • - two individual trait hierarchies identified in two subjects for the same set of traits;
  • - two group hierarchies of features;
  • - individual and group hierarchies of features.

We begin the calculation with ranking the studied indicators separately for each of the signs.

Let us analyze a case with two features measured in the same group of subjects. First, the individual values ​​are ranked according to the first attribute obtained by different subjects, and then the individual values ​​according to the second attribute. If lower ranks of one indicator correspond to lower ranks of another indicator, and higher ranks of one indicator correspond to higher ranks of another indicator, then the two features are positively related. If the higher ranks of one indicator correspond to the lower ranks of another indicator, then the two signs are negatively related. To find rs, we determine the differences between the ranks (d) for each subject. The smaller the difference between the ranks, the closer the rank correlation coefficient rs will be to "+1". If there is no relationship, then there will be no correspondence between them, hence rs will be close to zero. The greater the difference between the ranks of the subjects in two variables, the closer to "-1" will be the value of the coefficient rs. Thus, the Spearman rank correlation coefficient is a measure of any monotonic relationship between the two characteristics under study.

Consider the case with two individual feature hierarchies identified in two subjects for the same set of features. In this situation, the individual values ​​obtained by each of the two subjects according to a certain set of features are ranked. The feature with the lowest value should be assigned the first rank; the attribute with a higher value - the second rank, etc. Should be paid Special attention to ensure that all features are measured in the same units. For example, it is impossible to rank indicators if they are expressed in points of different “price”, since it is impossible to determine which of the factors will take the first place in terms of severity until all values ​​are brought to a single scale. If features that have low ranks in one of the subjects also have low ranks in the other, and vice versa, then the individual hierarchies are positively related.

In the case of two group hierarchies of features, the average group values ​​obtained in two groups of subjects are ranked according to the same set of features for the studied groups. Next, we follow the algorithm given in the previous cases.

Let us analyze the case with individual and group hierarchy of features. They start by ranking separately the individual values ​​of the subject and the mean group values ​​according to the same set of features that were obtained, with the exception of the subject who does not participate in the mean group hierarchy, since his individual hierarchy will be compared with it. Rank correlation makes it possible to assess the degree of consistency between the individual and group hierarchy of features.

Let us consider how the significance of the correlation coefficient is determined in the cases listed above. In the case of two features, it will be determined by the sample size. In the case of two individual feature hierarchies, the significance depends on the number of features included in the hierarchy. In the last two cases, the significance is determined by the number of traits studied, and not by the size of the groups. Thus, the significance of rs in all cases is determined by the number of ranked values ​​n.

When checking statistical significance rs use tables of critical values ​​of the rank correlation coefficient compiled for various numbers of ranked values ​​and different levels significance. If the absolute value of rs reaches a critical value or exceeds it, then the correlation is significant.

When considering the first option (a case with two features measured in the same group of subjects), the following hypotheses are possible.

H0: The correlation between variables x and y is not different from zero.

H1: The correlation between variables x and y is significantly different from zero.

If we work with any of the three remaining cases, then we need to put forward another pair of hypotheses:

H0: The correlation between the x and y hierarchies is nonzero.

H1: The correlation between x and y hierarchies is significantly different from zero.

The sequence of actions in calculating the Spearman rank correlation coefficient rs is as follows.

  • - Determine which two features or two feature hierarchies will participate in the matching as x and y variables.
  • - Rank the values ​​of the variable x, assigning rank 1 to the smallest value, according to the ranking rules. Place the ranks in the first column of the table in order of the numbers of the subjects or signs.
  • - Rank the values ​​of the variable y. Place the ranks in the second column of the table in order of the numbers of the subjects or signs.
  • - Calculate the differences d between the ranks x and y for each row of the table. The results are placed in the next column of the table.
  • - Calculate the squared differences (d2). Place the obtained values ​​in the fourth column of the table.
  • - Calculate the sum of the squares of the differences? d2.
  • - If the same ranks occur, calculate the corrections:

where tx is the volume of each group of equal ranks in sample x;

ty is the size of each group of equal ranks in sample y.

Calculate the rank correlation coefficient depending on the presence or absence of identical ranks. In the absence of identical ranks, the rank correlation coefficient rs is calculated using the formula:

In the presence of the same ranks, the rank correlation coefficient rs is calculated using the formula:

where?d2 is the sum of the squared differences between the ranks;

Tx and Ty - corrections for the same ranks;

n is the number of subjects or features that participated in the ranking.

Determine the critical values ​​of rs from table 3 of the Appendix, for a given number of subjects n. A significant difference from zero of the correlation coefficient will be observed provided that rs is not less than the critical value.

In the presence of two series of values ​​subjected to ranking, it is rational to calculate the Spearman's rank correlation.

Such rows can be represented:

  • a pair of features determined in the same group of objects under study;
  • a pair of individual subordinate signs determined in 2 studied objects by the same set of signs;
  • a pair of group subordinate signs;
  • individual and group subordination of signs.

The method involves ranking the indicators separately for each of the features.

The smallest value has the smallest rank.

This method refers to a non-parametric statistical method designed to establish the existence of a relationship between the studied phenomena:

  • determining the actual degree of parallelism between the two series of quantitative data;
  • assessment of the tightness of the identified relationship, expressed quantitatively.

Correlation analysis

A statistical method designed to identify the existence of a relationship between 2 or more random variables (variables), as well as its strength, is called correlation analysis.

It got its name from correlatio (lat.) - ratio.

When using it, the following scenarios are possible:

  • the presence of a correlation (positive or negative);
  • no correlation (zero).

In the case of establishing a relationship between variables we are talking about their correlation. In other words, we can say that when the value of X changes, a proportional change in the value of Y will necessarily be observed.

Various measures of connection (coefficients) are used as tools.

Their choice is influenced by:

  • a way to measure random numbers;
  • the nature of the relationship between random numbers.

The existence of a correlation can be displayed graphically (graphs) and with a coefficient (numerical display).

Correlation is characterized by the following features:

  • connection strength (with a correlation coefficient from ±0.7 to ±1 - strong; from ±0.3 to ±0.699 - medium; from 0 to ±0.299 - weak);
  • direction of communication (forward or reverse).

Goals of correlation analysis

Correlation analysis does not allow establishing a causal relationship between the studied variables.

It is carried out with the aim of:

  • establishment of dependence between variables;
  • obtaining certain information about a variable based on another variable;
  • determining the closeness (connection) of this dependence;
  • determining the direction of the established connection.

Methods of correlation analysis


This analysis can be done using:

  • method of squares or Pearson;
  • rank method or Spearman.

The Pearson method is applicable for calculations requiring exact definition the force that exists between variables. The signs studied with its help should be expressed only quantitatively.

To apply the Spearman method or rank correlation, there are no strict requirements in the expression of features - it can be both quantitative and attributive. Thanks to this method, information is obtained not on the exact establishment of the strength of the connection, but of an indicative nature.

Variable rows can contain open options. For example, when work experience is expressed by values ​​such as up to 1 year, more than 5 years, etc.

Correlation coefficient

A statistical value characterizing the nature of the change in two variables is called the correlation coefficient or pair coefficient correlations. In quantitative terms, it ranges from -1 to +1.

The most common ratios are:

  • Pearson– applicable for variables belonging to the interval scale;
  • Spearman– for ordinal scale variables.

Limitations on the use of the correlation coefficient

Obtaining unreliable data when calculating the correlation coefficient is possible in cases where:

  • there is a sufficient number of values ​​for the variable (25-100 pairs of observations);
  • between the studied variables, for example, a quadratic relationship is established, and not linear;
  • in each case, the data contains more than one observation;
  • the presence of abnormal values ​​(outliers) of variables;
  • the data under study consist of well-defined subgroups of observations;
  • the presence of a correlation does not allow one to establish which of the variables can be considered as a cause, and which - as a consequence.

Correlation Significance Test

To evaluate statistical values, the concept of their significance or reliability is used, which characterizes the probability of a random occurrence of a value or its extreme values.

The most common method for determining the significance of a correlation is to determine the Student's t-test.

Its value is compared with the tabular value, the number of degrees of freedom is taken as 2. When the calculated value of the criterion is greater than the tabular value, it indicates the significance of the correlation coefficient.

When conducting economic calculations sufficient confidence level 0.05 (95%) or 0.01 (99%).

Spearman ranks

Spearman's rank correlation coefficient makes it possible to statistically establish the presence of a connection between phenomena. Its calculation involves the establishment of a serial number for each attribute - a rank. The rank can be ascending or descending.

The number of features to be ranked can be any. This is a rather laborious process, limiting their number. Difficulties begin when you reach 20 signs.

To calculate the Spearman coefficient, use the formula:

wherein:

n - displays the number of ranked features;

d is nothing more than the difference between the ranks in two variables;

and ∑(d2) is the sum of squared rank differences.

Application of correlation analysis in psychology

Statistical support of psychological research makes it possible to make them more objective and highly representative. Statistical processing of data obtained in the course of psychological experiments helps to extract the maximum of useful information.

Correlation analysis has received the widest application in processing their results.

It is appropriate to conduct a correlation analysis of the results obtained during the research:

  • anxiety (according to R. Temml, M. Dorca, V. Amen tests);
  • family relationships (“Analysis of family relationships” (DIA) questionnaire of E.G. Eidemiller, V.V. Yustitskis);
  • the level of internality-externality (questionnaire of E.F. Bazhin, E.A. Golynkina and A.M. Etkind);
  • level emotional burnout teachers (questionnaire V.V. Boyko);
  • connections between the elements of the verbal intelligence of students in different profiles of education (method of K.M. Gurevich and others);
  • relationship between the level of empathy (method of V.V. Boyko) and satisfaction with marriage (questionnaire of V.V. Stolin, T.L. Romanova, G.P. Butenko);
  • links between the sociometric status of adolescents (test by Jacob L. Moreno) and the characteristics of the style of family education (questionnaire by E.G. Eidemiller, V.V. Yustitskis);
  • structures of life goals of adolescents brought up in complete and single-parent families (questionnaire Edward L. Deci, Richard M. Ryan Ryan).

Brief instructions for conducting correlation analysis according to the Spearman criterion

Correlation analysis using the Spearman method is performed according to the following algorithm:

  • paired comparable features are arranged in 2 rows, one of which is indicated by X, and the other by Y;
  • the values ​​of the X series are arranged in ascending or descending order;
  • the sequence of arrangement of the values ​​of the Y series is determined by their correspondence with the values ​​of the X series;
  • for each value in the X series, determine the rank - assign a serial number from the minimum value to the maximum;
  • for each of the values ​​in the Y series, also determine the rank (from minimum to maximum);
  • calculate the difference (D) between the ranks of X and Y, using the formula D=X-Y;
  • the resulting difference values ​​are squared;
  • sum the squares of the rank differences;
  • perform calculations using the formula:

Spearman Correlation Example

It is necessary to establish the presence of a correlation between the length of service and the injury rate in the presence of the following data:

The most appropriate method of analysis is the rank method, because one of the signs is presented in the form open options: work experience up to 1 year and work experience 7 years or more.

The solution of the problem begins with the ranking of data, which is summarized in a worksheet and can be done manually, because. their volume is not large:

Work experience Number of injuries Ordinal numbers (ranks) Rank Difference rank difference squared
d(x-y)
up to 1 year 24 1 5 -4 16
1-2 16 2 4 -2 4
3-4 12 3 2,5 +0,5 0,25
5-6 12 4 2,5 +1,5 2,5
7 or more 6 5 1 +4 16
Σd2 = 38.5

The appearance of fractional ranks in the column is due to the fact that in the case of the appearance of variants of the same size, the arithmetic mean value of the rank is found. V this example injury rate 12 occurs twice and is assigned ranks 2 and 3, we find the arithmetic mean of these ranks (2 + 3) / 2 = 2.5 and put this value in the worksheet for 2 indicators.
By substituting the obtained values ​​into the working formula and making simple calculations, we obtain the Spearman coefficient equal to -0.92

The negative value of the coefficient indicates the presence of a feedback between the signs and suggests that a short work experience is accompanied by a large number injuries. Moreover, the strength of the relationship of these indicators is quite large.
The next stage of calculations is to determine the reliability of the obtained coefficient:
its error and Student's criterion are calculated

Spearman's rank correlation method allows you to determine the tightness (strength) and direction of the correlation between two features or two profiles (hierarchies) of features.

To calculate the rank correlation, it is necessary to have two series of values,

which can be ranked. These ranges of values ​​can be:

1) two signs measured in the same group of subjects;

2) two individual hierarchies of traits identified in two subjects for the same set of traits;

3) two group hierarchies of features,

4) individual and group hierarchies of features.

First, the indicators are ranked separately for each of the features.

As a rule, a lower value of a feature is assigned a lower rank.

In the first case (two features), the individual values ​​for the first feature, obtained by different subjects, are ranked, and then the individual values ​​for the second feature.

If two attributes are positively related, then subjects with low ranks in one of them will have low ranks in the other, and subjects with high ranks in

one of the traits will also have high ranks on the other trait. To calculate rs, it is necessary to determine the difference (d) between the ranks obtained by the given subject on both grounds. Then these indicators d are transformed in a certain way and subtracted from 1. Than

the smaller the difference between the ranks, the larger rs will be, the closer it will be to +1.

If there is no correlation, then all ranks will be mixed and there will be no

no match. The formula is designed so that in this case rs will be close to 0.

In the case of a negative correlation, the low ranks of the subjects on one attribute

will correspond to high ranks on another attribute, and vice versa. The greater the discrepancy between the ranks of subjects on two variables, the closer rs is to -1.

In the second case (two individual profiles), individual

the values ​​obtained by each of the 2 subjects according to a certain (the same for both of them) set of features. The first rank will receive the trait with the lowest value; the second rank is a feature with a higher value, and so on. Obviously, all features must be measured in the same units, otherwise ranking is impossible. For example, it is impossible to rank indicators on the Cattell Personality Questionnaire (16PF) if they are expressed in "raw" scores, since the ranges of values ​​for different factors are different: from 0 to 13, from 0 to

20 and from 0 to 26. We cannot say which of the factors will take the first place in terms of severity until we bring all the values ​​to a single scale (most often this is the wall scale).

If the individual hierarchies of two subjects are positively related, then the features that have low ranks for one of them will have low ranks for the other, and vice versa. For example, if for one subject the factor E (dominance) has the lowest rank, then for another subject it should have a low rank, if one subject has factor C

(emotional stability) has the highest rank, then the other subject must also have

this factor has a high rank, and so on.

In the third case (two group profiles), the average group values ​​obtained in 2 groups of subjects are ranked according to a certain set of features that is the same for two groups. In what follows, the line of reasoning is the same as in the previous two cases.

In the case of the 4th (individual and group profiles), the individual values ​​of the subject and the mean group values ​​are ranked separately according to the same set of features that are obtained, as a rule, by excluding this individual subject - he does not participate in the mean group profile, with which he will be compared. individual profile. Rank correlation will allow you to check how consistent the individual and group profiles are.

In all four cases, the significance of the obtained correlation coefficient is determined by the number of ranked values ​​N. In the first case, this number will coincide with the sample size n. In the second case, the number of observations will be the number of features that make up the hierarchy. In the third and fourth cases, N is also the number of compared features, and not the number of subjects in the groups. Detailed explanations given in the examples. If the absolute value of rs reaches or exceeds a critical value, the correlation is significant.

Hypotheses.

There are two possible hypotheses. The first refers to case 1, the second to the other three cases.

The first version of hypotheses

H0: The correlation between variables A and B is not different from zero.

H1: The correlation between variables A and B is significantly different from zero.

The second version of the hypotheses

H0: Correlation between hierarchies A and B is not different from zero.

H1: The correlation between hierarchies A and B is significantly different from zero.

Limitations of the rank correlation coefficient

1. At least 5 observations must be submitted for each variable. The upper limit of the sample is determined by the available tables of critical values.

2. Spearman's rank correlation coefficient rs with a large number of identical ranks for one or both compared variables gives coarsened values. Ideally, both correlated series should be two sequences of mismatched values. If this condition is not met, it is necessary to make an adjustment for the same ranks.

Spearman's rank correlation coefficient is calculated by the formula:

If in both compared rank series there are groups of the same ranks, before calculating the rank correlation coefficient, it is necessary to make corrections for the same ranks Ta and Tv:

Ta \u003d Σ (a3 - a) / 12,

TV \u003d Σ (v3 - c) / 12,

where a is the volume of each group of identical ranks in the rank series A, c is the volume of each

groups of equal ranks in the rank series B.

To calculate the empirical value of rs, use the formula:

Calculation of Spearman's rank correlation coefficient rs

1. Determine which two characteristics or two characteristic hierarchies will participate in

comparison as variables A and B.

2. Rank the values ​​of the variable A, assigning rank 1 to the smallest value, in accordance with the ranking rules (see A.2.3). Enter the ranks in the first column of the table in order of the numbers of the subjects or signs.

3. Order the values ​​of the variable B, in accordance with the same rules. Enter the ranks in the second column of the table in order of the numbers of the subjects or signs.

5. Square each difference: d2. Enter these values ​​in the fourth column of the table.

Ta \u003d Σ (a3 - a) / 12,

TV \u003d Σ (v3 - c) / 12,

where a is the volume of each group of identical ranks in the rank row A; c - the volume of each group

the same ranks in the ranking series B.

a) in the absence of identical ranks

rs  1 − 6 ⋅

b) in the presence of the same ranks

Σd 2  T  T

r  1 − 6 ⋅ a in,

where Σd2 is the sum of squared differences between ranks; Ta and TV are corrections for the same

N is the number of subjects or features that participated in the ranking.

9. Determine from the Table (see Appendix 4.3) the critical values ​​of rs for a given N. If rs exceeds critical value or at least equal to it, the correlation is significantly different from 0.

Example 4.1. When determining the degree of dependence of the reaction of drinking alcohol on the oculomotor reaction in the test group, data were obtained before drinking alcohol and after drinking. Does the reaction of the subject depend on the state of intoxication?

Experiment results:

Before: 16, 13, 14, 9, 10, 13, 14, 14, 18, 20, 15, 10, 9, 10, 16, 17, 18. After: 24, 9, 10, 23, 20, 11, 12, 19, 18, 13, 14, 12, 14, 7, 9, 14. Let's formulate hypotheses:

H0: the correlation between the degree of dependence of the reaction before drinking alcohol and after drinking does not differ from zero.

H1: the correlation between the degree of dependence of the reaction before drinking alcohol and after drinking is significantly different from zero.

Table 4.1. Calculation of d2 for the Spearman rank correlation coefficient rs when comparing the parameters of the oculomotor reaction before and after the experiment (N=17)

values

values

Since we have duplicate ranks, in this case we will apply the formula adjusted for the same ranks:

Ta= ((23-2)+(33-3)+(23-2)+(33-3)+(23-2)+(23-2))/12=6

Tb =((23-2)+(23-2)+(33-3))/12=3

Find the empirical value of the Spearman coefficient:

rs = 1- 6*((767.75+6+3)/(17*(172-1)))=0.05

According to the table (Appendix 4.3) we find the critical values ​​of the correlation coefficient

0.48 (p ≤ 0.05)

0.62 (p ≤ 0.01)

We get

rs=0.05∠rcr(0.05)=0.48

Conclusion: H1 hypothesis is rejected and H0 is accepted. Those. correlation between degree

dependence of the reaction before alcohol consumption and after does not differ from zero.