Critical values ​​of the correlation coefficient of spearman ranks. An example of finding the Spearman rank correlation coefficient

Publication date: 09/03/2017 13:01

The term "correlation" is actively used in humanities, medicine; frequently featured in the media. Correlations play a key role in psychology. In particular, the calculation of correlations is milestone implementation of empirical research when writing a WRC in psychology.

Correlation stuff on the web is too scientific. It is difficult for a non-specialist to understand the formulas. At the same time, understanding the meaning of correlations is necessary for a marketer, sociologist, physician, psychologist - everyone who conducts research on people.

In this article we plain language we will explain the essence of the correlation, types of correlations, methods of calculation, features of the use of correlation in psychological research, as well as when writing theses in psychology.

Content

What is correlation

Correlation is communication. But not any. What is its peculiarity? Let's look at an example.

Imagine that you are driving a car. You press the gas pedal - the car goes faster. You slow down the gas - the car slows down. Even a person who is not familiar with the device of a car will say: “There is a direct relationship between the gas pedal and the speed of the car: the harder the pedal is pressed, the higher the speed.”

This dependence is functional - the speed is a direct function of the gas pedal. The specialist will explain that the pedal controls the supply of fuel to the cylinders, where the combustion of the mixture occurs, which leads to an increase in power to the shaft, etc. This connection is rigid, deterministic, not allowing exceptions (provided that the machine is working).

Now imagine that you are the director of a company whose employees sell goods. You decide to increase sales by raising employees' salaries. You raise your salary by 10%, and the company's average sales go up. After a while, you increase by another 10%, and again growth. Then another 5%, and again there is an effect. The conclusion suggests itself - there is a direct relationship between the sales of the company and the salary of employees - the higher the salaries, the higher the sales of the organization. Is this the same connection as between the gas pedal and the speed of the car? What is the key difference?

That's right, the relationship between salary and sales is not rigid. This means that for some of the employees, sales could even decline, despite the increase in salary. Somebody's got to stay the same. But on average, sales have grown in the company, and we say that there is a relationship between sales and employee salaries, and it is correlated.

At the core functional connection(gas pedal - speed) is a physical law. The correlation (sales - salary) is based on a simple consistency of changes in two indicators. There is no law (in the physical sense of the word) behind correlation. There is only a probabilistic (stochastic) regularity.

Numerical expression of correlation dependence

So, the correlation reflects the dependence between phenomena. If these phenomena can be measured, then it receives a numerical expression.

For example, the role of reading in people's lives is being studied. The researchers took a group of 40 people and measured two indicators for each subject: 1) how much time he reads per week; 2) to what extent he considers himself successful (on a scale from 1 to 10). The researchers plotted the data in two columns and used a statistical program to calculate the correlation between reading and well-being. Suppose they got the following result -0.76. But what does this number mean? How to interpret it? Let's figure it out.

The resulting number is called the correlation coefficient. For its correct interpretation, it is important to consider the following:

  1. The sign "+" or "-" reflects the direction of dependence.
  2. The value of the coefficient reflects the strength of the dependence.

Direct and reverse

The plus sign in front of the coefficient indicates that the relationship between phenomena or indicators is direct. That is, the greater one indicator, the greater the other. Higher salary means higher sales. Such a correlation is called direct, or positive.

If the coefficient has a minus sign, then the correlation is inverse, or negative. In this case, the higher one indicator, the lower the other. In the reading and well-being example, we got -0.76, which means that the more people read, the lower their level of well-being.

Strong and weak

Correlation in numerical terms is a number in the range from -1 to +1. Denoted by the letter "r". The higher the number (ignoring the sign), the stronger the correlation.

The lower the numerical value of the coefficient, the less the relationship between phenomena and indicators.

The maximum possible dependency strength is 1 or -1. How to understand and present it?

Consider an example. They took 10 students and measured their level of intelligence (IQ) and academic performance for the semester. Arranged this data in two columns.

test subject

IQ

Progress (points)

Look closely at the data in the table. From 1 to 10 of the test subject, the IQ level increases. But the level of achievement is also rising. Of any two students, the one with the higher IQ will perform better. And there will be no exceptions to this rule.

Before us is an example of a complete, 100% coordinated change in two indicators in a group. And this is an example of the maximum possible positive relationship. That is, the correlation between intelligence and performance is 1.

Let's consider another example. The same 10 students were assessed with the help of a survey to what extent they feel successful in communicating with the opposite sex (on a scale from 1 to 10).

test subject

IQ

Success in communicating with the opposite sex (points)

We look closely at the data in the table. From 1 to 10 of the test subject, the IQ level increases. At the same time, the level of success in communication with the opposite sex consistently decreases in the last column. Of any two students, the one with the lower IQ will be more successful in communicating with the opposite sex. And there will be no exceptions to this rule.

This is an example of complete consistency in the change of two indicators in the group - the maximum possible negative relationship. The correlation between IQ and the success of communication with the opposite sex is -1.

How to understand the meaning of correlation zero(0)? This means that there is no relationship between the indicators. Once again, let's return to our students and consider another indicator measured by them - the length of the jump from a place.

test subject

IQ

Standing jump length (m)

There is no consistency between person-to-person variation in IQ and long jump. This indicates a lack of correlation. The correlation coefficient of IQ and jump length for students is 0.

We've looked at extreme cases. In real measurements, the coefficients are rarely equal to exactly 1 or 0. In this case, the following scale is adopted:

  • if the coefficient is greater than 0.70 - the relationship between the indicators is strong;
  • from 0.30 to 0.70 - the connection is moderate,
  • less than 0.30 - the connection is weak.

If we evaluate on this scale the correlation we obtained above between reading and well-being, it turns out that this dependence is strong and negative -0.76. That is, there is a strong negative relationship between erudition and well-being. Which once again confirms the biblical wisdom about the relationship between wisdom and sorrow.

The given gradation gives very rough estimates and is rarely used in research in this form.

Gradations of coefficients according to significance levels are more often used. In this case, the actual coefficient obtained may be significant or not significant. This can be determined by comparing its value with the critical value of the correlation coefficient taken from a special table. Moreover, these critical values ​​depend on the size of the sample (the larger the volume, the lower critical value).

Correlation analysis in psychology

The correlation method is one of the main ones in psychological research. And this is not accidental, because psychology strives to be an exact science. Does it work?

What is the peculiarity of laws in the exact sciences. For example, the law of gravity in physics operates without exception: the greater the mass of a body, the stronger it attracts other bodies. This physical law reflects the relationship between body mass and gravity.

In psychology, the situation is different. For example, psychologists publish data on the relationship of warm relationships in childhood with parents and the level of creativity in adulthood. Does this mean that any of the subjects with a very warm relationship with their parents in childhood will have very high Creative skills? The answer is unequivocal - no. There is no law like the physical one. No mechanism of influence childhood experience on adult creativity. These are our fantasies! There is data consistency (relationships - creativity), but there is no law behind them. But there is only correlation. Psychologists often refer to the identified relationships as psychological patterns, emphasizing their probabilistic nature - not rigidity.

The student study example from the previous section illustrates well the use of correlations in psychology:

  1. Analysis of the relationship between psychological indicators. In our example, IQ and the success of communication with the opposite sex are psychological parameters. Identification of the correlation between them expands the understanding of the mental organization of a person, of the relationship between various aspects of his personality - in this case, between the intellect and the sphere of communication.
  2. Analysis of the relationship of IQ with academic performance and jumping is an example of the relationship of a psychological parameter with non-psychological ones. The results obtained reveal the features of the influence of intelligence on educational and sports activities.

Here's what a summary of the results of a fictional study on students could look like:

  1. A significant positive relationship between the intelligence of students and their academic performance was revealed.
  2. There is a negative significant relationship between IQ and successful communication with the opposite sex.
  3. There was no connection between the IQ of students and the ability to jump from a place.

Thus, the level of intelligence of students acts as a positive factor in their academic performance, while at the same time negatively affecting relationships with the opposite sex and not having a significant impact on sports success, in particular, the ability to jump from a place.

As you can see, the intellect helps students to learn, but prevents them from building relationships with the opposite sex. This does not affect their athletic performance.

The ambiguous influence of intelligence on the personality and activity of students reflects the complexity of this phenomenon in the structure of personality traits and the importance of continuing research in this direction. In particular, it seems important to analyze the relationship between intelligence and psychological characteristics and activities of students, taking into account their gender.

Pearson and Spearman coefficients

Let's consider two calculation methods.

The Pearson coefficient is a special method for calculating the relationship of indicators between the severity of numerical values ​​in one group. Very simplified, it boils down to this:

  1. The values ​​of two parameters in the group of subjects are taken (for example, aggression and perfectionism).
  2. The average values ​​of each parameter in the group are found.
  3. The differences between the parameters of each subject and the average value are found.
  4. These differences are substituted into special form to calculate the Pearson coefficient.

Coefficient rank correlation Spearman is calculated in a similar way:

  1. The values ​​of two indicators in the group of subjects are taken.
  2. The ranks of each factor in the group are found, that is, the place in the list in ascending order.
  3. The rank differences are found, squared and summed.
  4. Next, the rank differences are substituted into a special form to calculate the Spearman coefficient.

In Pearson's case, the calculation was based on the average value. Therefore, random data outliers (significant difference from the mean), for example, due to processing error or unreliable answers, can significantly distort the result.

In Spearman's case absolute values data do not play a role, since only their relative position in relation to each other (ranks) is taken into account. That is, data outliers or other inaccuracies will not seriously affect the final result.

If the test results are correct, then the differences between the Pearson and Spearman coefficients are insignificant, while the Pearson coefficient shows more exact value data relationships.

How to Calculate the Correlation Coefficient

The Pearson and Spearman coefficients can be calculated manually. This may be necessary for an in-depth study of statistical methods.

However, in most cases, when solving applied problems, including in psychology, it is possible to carry out calculations using special programs.

Calculation using Microsoft Excel spreadsheets

Let's go back to the students example and look at the data on their level of intelligence and the length of the jump from a place. Let's enter this data (two columns) into an Excel spreadsheet.

After moving the cursor to an empty cell, press the "Insert Function" option and select "CORREL" from the "Statistical" section.

The format of this function assumes the selection of two data arrays: CORREL(array 1; array"). We highlight the column with IQ and the length of the jumps, respectively.

V Excel tables the formula for calculating only the Pearson coefficient is implemented.

Calculation with the program STATISTICA

We enter data on intelligence and the length of the jump in the field of initial data. Next, select the option "Nonparametric criteria", "Spearman". Select the parameters for the calculation and get the following result.


As you can see, the calculation gave a result of 0.024, which differs from the Pearson result - 0.038, obtained above using Excel. However, the differences are minor.

Using correlation analysis in psychology theses (example)

Most of the topics of final qualification works in psychology (diplomas, term papers, master's) involve a correlation study (the rest are related to identifying differences in psychological indicators in different groups).

The very term "correlation" in the titles of topics rarely sounds - it is hidden behind the following wording:

  • "The relationship between subjective feelings of loneliness and self-actualization in women of mature age";
  • “Peculiarities of the influence of the resilience of managers on the success of their interaction with clients in conflict situations”;
  • "Personal factors of stress resistance of employees of the Ministry of Emergency Situations."

Thus, the words "relationship", "influence" and "factors" are sure signs that the method of data analysis in empirical research should be correlation analysis.

Let us briefly consider the stages of its implementation when writing a thesis in psychology on the topic: "The relationship of personal anxiety and aggressiveness in adolescents."

1. For the calculation, raw data are required, which are usually the test results of the subjects. They are entered into a pivot table and placed in the application. This table is structured as follows:

  • each line contains data for one subject;
  • each column contains scores on one scale for all subjects.

subject number

Personal anxiety

Aggressiveness

2. It is necessary to decide which of the two types of coefficients - Pearson or Spearman - will be used. Recall that Pearson gives a more accurate result, but it is sensitive to outliers in the data. Spearman coefficients can be used with any data (except for the nominative scale), which is why they are most often used in psychology diplomas.

3. We enter the table of raw data into the statistical program.

4. Calculate the value.



5. The next step is to determine if the relationship is significant. The statistical program highlighted the results in red, which means that the correlations are statistically significant at a significance level of 0.05 (indicated above).

However, it is useful to know how to determine the significance manually. To do this, you need Spearman's critical values ​​table.

Table of critical values ​​of the Spearman coefficients

Level of statistical significance

Number of test subjects

p=0.05

p=0.01

p=0.001

0,88

0,96

0,99

0,81

0,92

0,97

0,75

0,88

0,95

0,71

0,83

0,93

0,67

0,63

0,77

0,87

0,74

0,85

0,58

0,71

0,82

0,55

0,68

0,53

0,66

0,78

0,51

0,64

0,76

We are interested in the significance level of 0.05 and the size of our sample of 10 people. At the intersection of these data, we find the value of the critical Spearman: Rcr=0.63.

The rule is this: if the Spearman empirical value obtained is greater than or equal to the critical value, then it is statistically significant. In our case: Remp (0.66) > Rcr (0.63), therefore, the relationship between aggressiveness and anxiety in the adolescent group is statistically significant.

5. In the text of the thesis, you need to insert data in a word format table, and not a table from a statistical program. Below the table, we describe the result obtained and interpret it.

Table 1

Spearman's coefficients of aggressiveness and anxiety in a group of adolescents

Aggressiveness

Personal anxiety

0,665*

* - statistically significant (p0,05)

Analysis of the data presented in Table 1 shows that there is a statistically significant positive relationship between the aggressiveness and anxiety of adolescents. This means that the higher the personal anxiety of adolescents, the higher the level of their aggressiveness. This result suggests that aggression for adolescents is one of the ways to relieve anxiety. Experiencing self-doubt, anxiety due to threats to self-esteem, especially sensitive in adolescence, a teenager often uses aggressive behavior, in such an unproductive way to reduce anxiety.

6. Is it possible to talk about influence when interpreting relationships? Can we say that anxiety affects aggressiveness? Strictly speaking, no. We have shown above that the correlation between phenomena is of a probabilistic nature and reflects only the consistency of changes in characteristics in a group. At the same time, we cannot say that this consistency is caused by the fact that one of the phenomena is the cause of the other, affects it. That is, the presence of a correlation between psychological parameters does not give grounds to talk about the existence of a causal relationship between them. However, practice shows that the term "influence" is often used when analyzing the results correlation analysis.

The rank correlation coefficient, proposed by K. Spearman, refers to non-parametric indicators of the relationship between variables measured on a rank scale. When calculating this coefficient, no assumptions are required about the nature of the distribution of features in the general population. This coefficient determines the degree of tightness of the connection of ordinal features, which in this case represent the ranks of the compared values.

The value of Spearman's correlation coefficient also lies in the range of +1 and -1. It, like the Pearson coefficient, can be positive and negative, characterizing the direction of the relationship between two features measured in the rank scale.

In principle, the number of ranked features (qualities, traits, etc.) can be any, but the process of ranking more than 20 features is difficult. It is possible that this is why the table of critical values ​​of the rank correlation coefficient is calculated only for forty ranked features (n< 40, табл. 20 приложения 6).

Spearman's rank correlation coefficient is calculated by the formula:

where n is the number of ranked features (indicators, subjects);

D is the difference between the ranks in two variables for each subject;

Sum of squared rank differences.

Using the rank correlation coefficient, consider the following example.

Example: The psychologist finds out how the individual indicators of readiness for school, obtained before the start of schooling for 11 first-graders and their average performance at the end of the school year, are interconnected.

To solve this problem, we ranked, firstly, the values ​​of indicators of school readiness obtained when entering school, and, secondly, the final performance indicators at the end of the year for these same students on average. The results are presented in Table. thirteen.

Table 13

No. of students

Ranks of indicators school readiness

Ranks of average annual performance

We substitute the obtained data into the formula and perform the calculation. We get:

To find the level of significance, we turn to Table. 20 of Appendix 6, which gives the critical values ​​for the rank correlation coefficients.

We emphasize that in Table. 20 Appendix 6, as in the table for linear correlation Pearson, all values ​​of correlation coefficients are given in absolute value. Therefore, the sign of the correlation coefficient is taken into account only when interpreting it.

Finding the levels of significance in this table is carried out according to the number n, i.e., according to the number of subjects. In our case, n = 11. For this number, we find:

0.61 for P 0.05

0.76 for P 0.01

We build the corresponding ``significance axis"":

The resulting correlation coefficient coincided with the critical value for a significance level of 1%. Therefore, it can be argued that the indicators of school readiness and the final grades of first-graders are positively correlated - in other words, the higher the indicator of school readiness, the better the first-grader learns. In terms of statistical hypotheses, the psychologist must reject the null hypothesis of similarity and accept the alternative (but difference) hypothesis, which says that the relationship between school readiness and average performance is non-zero.

Case of identical (equal) ranks

In the presence of the same ranks, the formula for calculating the Spearman linear correlation coefficient will be somewhat different. In this case, two new terms are added to the formula for calculating the correlation coefficients, taking into account the same ranks. They are called corrections for the same ranks and are added to the numerator of the calculation formula.

where n is the number of identical ranks in the first column,

k is the number of identical ranks in the second column.

If there are two groups of identical ranks in any column, then the correction formula becomes somewhat more complicated:

where n is the number of equal ranks in the first group of the ranked column,

k is the number of equal ranks in the second group of the ranked column. The modification of the formula in the general case is as follows:

Example: A psychologist, using a test of mental development (ISTU), conducts a study of intelligence in 12 students in grade 9. At the same time, he asks teachers of literature and mathematics to rank these same students according to indicators of mental development. The task is to determine how the objective indicators of mental development (STI data) and expert assessments of teachers are related.

The experimental data of this problem and the additional columns required to calculate the Spearman correlation coefficient are presented in the form of a table. 14.

Table 14

No. of students

Ranks of testing with the help of SHTUR

Expert assessments of teachers in mathematics

Expert assessments of teachers in literature

D (second and third columns)

D (second and fourth columns)

(second and third columns)

(second and fourth columns)

Since the ranking used the same ranks, it is necessary to check the correctness of the ranking in the second, third and fourth columns of the table. The summation in each of these columns gives the same sum - 78.

Checking by calculation formula. The check gives:

The fifth and sixth columns of the table show the values ​​of the difference in ranks between the expert assessments of the psychologist on the STUD test for each student and the values ​​of the teachers' expert assessments, respectively, in mathematics and literature. The sum of the rank differences must be equal to zero. The summation of the D values ​​in the fifth and sixth columns gave the desired result. Therefore, the subtraction of ranks was carried out correctly. A similar check must be done every time when performing complex types of ranking.

Before starting the calculation by the formula, it is necessary to calculate the corrections for the same ranks for the second, third and fourth columns of the table.

In our case, there are two identical ranks in the second column of the table, therefore, according to the formula, the D1 correction value will be:

There are three identical ranks in the third column, therefore, according to the formula, the correction value D2 will be:

In the fourth column of the table, there are two groups of three identical ranks, therefore, according to the formula, the D3 correction value will be:

Before proceeding to solve the problem, let us recall that the psychologist finds out two questions - how are the values ​​of the ranks on the STUR test related to expert assessments in mathematics and literature. That is why the calculation is carried out twice.

We consider the first rank coefficient, taking into account the additives according to the formula. We get:

Let's calculate without taking into account the additive:

As you can see, the difference in the values ​​of the correlation coefficients turned out to be very insignificant.

We consider the second rank coefficient, taking into account the additives according to the formula. We get:

Let's calculate without taking into account the additive:

Again, the differences were very small. Since the number of students in both cases is the same, according to Table. 20 Appendix 6 we find the critical values ​​at n = 12 for both correlation coefficients at once.

0.58 for P 0.05

0.73 for P 0.01

Plot the first value on the ``significance axis"":

In the first case, the obtained rank correlation coefficient is in the zone of significance. Therefore, the psychologist must reject the null hypothesis that the correlation coefficient is similar to zero and accept the alternative hypothesis that the correlation coefficient is significantly different from zero. In other words, the result obtained suggests that the higher the students' expert scores on the STUD test, the higher their expert scores in mathematics.

Plot the second value on the ``significance axis"":

In the second case, the rank correlation coefficient is in the zone of uncertainty. Therefore, a psychologist can accept the null hypothesis that the correlation coefficient is similar to zero and reject the alternative hypothesis that the correlation coefficient is significantly different from zero. In this case, the result obtained indicates that the students' expert assessments on the STUD test are not related to expert assessments in literature.

To apply the Spearman correlation coefficient, the following conditions must be met:

1. The variables being compared must be obtained on an ordinal (rank) scale, but can also be measured on a scale of intervals and ratios.

2. The nature of the distribution of correlated values ​​does not matter.

3. The number of varying features in the compared variables X and Y must be the same.

Tables for determining the critical values ​​of the Spearman correlation coefficient (Table 20, Appendix 6) are calculated from the number of signs equal to n = 5 to n = 40, and with a larger number of compared variables, the table for the Pearson correlation coefficient should be used (Table 19, Appendix 6). Finding critical values ​​is carried out at k = n.

In practice, Spearman's rank correlation coefficient (P) is often used to determine the closeness of the relationship between two features. The values ​​of each feature are ranked in ascending order (from 1 to n), then the difference (d) between the ranks corresponding to one observation is determined.

Example #1. The relationship between the volume of industrial production and investments in fixed capital in 10 areas of one of federal districts RF in 2003 is characterized by the following data.
Calculate Spearman's rank correlation coefficients and Kendala. Check their significance at α=0.05. Formulate a conclusion about the relationship between the volume of industrial production and investments in fixed assets in the regions of the Russian Federation under consideration.

Assign ranks to the feature Y and the factor X . Find the sum of the difference of squares d 2 .
Using the calculator, we calculate the Spearman's rank correlation coefficient:

X Y rank X, dx rank Y, d y (dx - dy) 2
1.3 300 1 2 1
1.8 1335 2 12 100
2.4 250 3 1 4
3.4 946 4 8 16
4.8 670 5 7 4
5.1 400 6 4 4
6.3 380 7 3 16
7.5 450 8 5 9
7.8 500 9 6 9
17.5 1582 10 16 36
18.3 1216 11 9 4
22.5 1435 12 14 4
24.9 1445 13 15 4
25.8 1820 14 19 25
28.5 1246 15 10 25
33.4 1435 16 14 4
42.4 1800 17 18 1
45 1360 18 13 25
50.4 1256 19 11 64
54.8 1700 20 17 9
364

The relationship between feature Y factor X is strong and direct.

Estimation of Spearman's rank correlation coefficient



According to the Student's table, we find Ttable.
T table \u003d (18; 0.05) \u003d 1.734
Since Tobs > Ttabl, we reject the hypothesis that the rank correlation coefficient is equal to zero. In other words, Spearman's rank correlation coefficient is statistically significant.

Interval estimate for rank correlation coefficient (confidence interval)
Confidence interval for Spearman's rank correlation coefficient: p(0.5431;0.9095).

Example #2. Initial data.

5 4
3 4
1 3
3 1
6 6
2 2
Since the matrix has related ranks (the same rank number) of the 1st row, we will reshape them. The ranks are re-formed without changing the importance of the rank, that is, the corresponding ratios (greater than, less than or equal to) must be preserved between the rank numbers. It is also not recommended to set the rank above 1 and below the value equal to the number of parameters (in this case n = 6). Reformation of ranks is made in table.
New ranks
1 1 1
2 2 2
3 3 3.5
4 3 3.5
5 5 5
6 6 6
Since there are bound ranks of the 2nd row in the matrix, we will reshape them. Reformation of ranks is made in table.
Seat numbers in ordered rowLocation of factors according to the expert's assessmentNew ranks
1 1 1
2 2 2
3 3 3
4 4 4.5
5 4 4.5
6 6 6
Rank matrix.
rank X, dxrank Y, d y(dx - dy) 2
5 4.5 0.25
3.5 4.5 1
1 3 4
3.5 1 6.25
6 6 0
2 2 0
21 21 11.5
Since among the values ​​of features x and y there are several identical ones, i.e. bound ranks are formed, then in this case the Spearman coefficient is calculated as:

where


j - numbers of links in order for feature x;
And j is the number of identical ranks in j-th bundle by x;
k - numbers of sheaves in order for feature y;
In k - the number of identical ranks in the k-th bundle in y.
A = [(2 3 -2)]/12 = 0.5
B = [(2 3 -2)]/12 = 0.5
D = A + B = 0.5 + 0.5 = 1

The relationship between feature Y and factor X is moderate and direct.

In the presence of two series of values ​​subjected to ranking, it is rational to calculate the Spearman's rank correlation.

Such rows can be represented:

  • a pair of features determined in the same group of objects under study;
  • a pair of individual subordinate signs determined in 2 studied objects by the same set of signs;
  • a pair of group subordinate signs;
  • individual and group subordination of signs.

The method involves ranking the indicators separately for each of the features.

The smallest value has the smallest rank.

This method refers to a non-parametric statistical method designed to establish the existence of a relationship between the studied phenomena:

  • determining the actual degree of parallelism between the two series of quantitative data;
  • assessment of the tightness of the identified relationship, expressed quantitatively.

Correlation analysis

A statistical method designed to detect the existence of a relationship between 2 or more random variables(variables), as well as its strength, is called correlation analysis.

It got its name from correlatio (lat.) - ratio.

When using it, the following scenarios are possible:

  • the presence of a correlation (positive or negative);
  • no correlation (zero).

In the case of establishing a relationship between variables we are talking about their correlation. In other words, we can say that when the value of X changes, a proportional change in the value of Y will necessarily be observed.

Various measures of connection (coefficients) are used as tools.

Their choice is influenced by:

  • a way to measure random numbers;
  • the nature of the relationship between random numbers.

The existence of a correlation can be displayed graphically (graphs) and with a coefficient (numerical display).

Correlation is characterized by the following features:

  • connection strength (with a correlation coefficient from ±0.7 to ±1 - strong; from ±0.3 to ±0.699 - medium; from 0 to ±0.299 - weak);
  • direction of communication (forward or reverse).

Goals of correlation analysis

Correlation analysis does not allow establishing a causal relationship between the studied variables.

It is carried out with the aim of:

  • establishment of dependence between variables;
  • obtaining certain information about a variable based on another variable;
  • determining the closeness (connection) of this dependence;
  • determining the direction of the established connection.

Methods of correlation analysis


This analysis can be done using:

  • method of squares or Pearson;
  • rank method or Spearman.

The Pearson method is applicable for calculations requiring exact definition the force that exists between variables. The signs studied with its help should be expressed only quantitatively.

To apply the Spearman method or rank correlation, there are no strict requirements in the expression of features - it can be both quantitative and attributive. Thanks to this method, information is obtained not on the exact establishment of the strength of the connection, but of an indicative nature.

Variable rows can contain open options. For example, when work experience is expressed by values ​​such as up to 1 year, more than 5 years, etc.

Correlation coefficient

A statistical value characterizing the nature of the change in two variables is called the correlation coefficient or pair coefficient correlations. In quantitative terms, it ranges from -1 to +1.

The most common ratios are:

  • Pearson– applicable for variables belonging to the interval scale;
  • Spearman– for ordinal scale variables.

Limitations on the use of the correlation coefficient

Obtaining unreliable data when calculating the correlation coefficient is possible in cases where:

  • there is a sufficient number of values ​​for the variable (25-100 pairs of observations);
  • between the studied variables, for example, a quadratic relationship is established, and not linear;
  • in each case, the data contains more than one observation;
  • the presence of abnormal values ​​(outliers) of variables;
  • the data under study consist of well-defined subgroups of observations;
  • the presence of a correlation does not allow one to establish which of the variables can be considered as a cause, and which - as a consequence.

Correlation Significance Test

To evaluate statistical values, the concept of their significance or reliability is used, which characterizes the probability of a random occurrence of a value or its extreme values.

The most common method for determining the significance of a correlation is to determine the Student's t-test.

Its value is compared with the tabular value, the number of degrees of freedom is taken as 2. When the calculated value of the criterion is greater than the tabular value, it indicates the significance of the correlation coefficient.

When conducting economic calculations sufficient confidence level 0.05 (95%) or 0.01 (99%).

Spearman ranks

Spearman's rank correlation coefficient makes it possible to statistically establish the presence of a connection between phenomena. Its calculation involves the establishment of a serial number for each attribute - a rank. The rank can be ascending or descending.

The number of features to be ranked can be any. This is a rather laborious process, limiting their number. Difficulties begin when you reach 20 signs.

To calculate the Spearman coefficient, use the formula:

wherein:

n - displays the number of ranked features;

d is nothing more than the difference between the ranks in two variables;

and ∑(d2) is the sum of squared rank differences.

Application of correlation analysis in psychology

Statistical support of psychological research makes it possible to make them more objective and highly representative. Statistical processing of data obtained in the course of psychological experiments helps to extract the maximum of useful information.

Correlation analysis has received the widest application in processing their results.

It is appropriate to conduct a correlation analysis of the results obtained during the research:

  • anxiety (according to R. Temml, M. Dorca, V. Amen tests);
  • family relationships (“Analysis of family relationships” (DIA) questionnaire of E.G. Eidemiller, V.V. Yustitskis);
  • the level of internality-externality (questionnaire of E.F. Bazhin, E.A. Golynkina and A.M. Etkind);
  • level emotional burnout teachers (questionnaire V.V. Boyko);
  • connections between the elements of the verbal intelligence of students in different profiles of education (method of K.M. Gurevich and others);
  • relationship between the level of empathy (method of V.V. Boyko) and satisfaction with marriage (questionnaire of V.V. Stolin, T.L. Romanova, G.P. Butenko);
  • links between the sociometric status of adolescents (test by Jacob L. Moreno) and the characteristics of the style of family education (questionnaire by E.G. Eidemiller, V.V. Yustitskis);
  • structures of life goals of adolescents brought up in complete and single-parent families (questionnaire Edward L. Deci, Richard M. Ryan Ryan).

Brief instructions for conducting correlation analysis according to the Spearman criterion

Correlation analysis using the Spearman method is performed according to the following algorithm:

  • paired comparable features are arranged in 2 rows, one of which is indicated by X, and the other by Y;
  • the values ​​of the X series are arranged in ascending or descending order;
  • the sequence of arrangement of the values ​​of the Y series is determined by their correspondence with the values ​​of the X series;
  • for each value in the X series, determine the rank - assign a serial number from the minimum value to the maximum;
  • for each of the values ​​in the Y series, also determine the rank (from minimum to maximum);
  • calculate the difference (D) between the ranks of X and Y, using the formula D=X-Y;
  • the resulting difference values ​​are squared;
  • sum the squares of the rank differences;
  • perform calculations using the formula:

Spearman Correlation Example

It is necessary to establish the presence of a correlation between the length of service and the injury rate in the presence of the following data:

The most appropriate method of analysis is the rank method, because one of the signs is presented in the form open options: work experience up to 1 year and work experience 7 years or more.

The solution of the problem begins with the ranking of data, which is summarized in a worksheet and can be done manually, because. their volume is not large:

Work experience Number of injuries Ordinal numbers (ranks) Rank Difference rank difference squared
d(x-y)
up to 1 year 24 1 5 -4 16
1-2 16 2 4 -2 4
3-4 12 3 2,5 +0,5 0,25
5-6 12 4 2,5 +1,5 2,5
7 or more 6 5 1 +4 16
Σd2 = 38.5

The appearance of fractional ranks in the column is due to the fact that in the case of the appearance of variants of the same size, the arithmetic mean value of the rank is found. V this example injury rate 12 occurs twice and it is assigned ranks 2 and 3, we find the arithmetic mean of these ranks (2 + 3) / 2 = 2.5 and put this value in the worksheet for 2 indicators.
By substituting the obtained values ​​into the working formula and making simple calculations, we obtain the Spearman coefficient equal to -0.92

The negative value of the coefficient indicates the presence of a feedback between the signs and suggests that a short work experience is accompanied by a large number injuries. Moreover, the strength of the relationship of these indicators is quite large.
The next stage of calculations is to determine the reliability of the obtained coefficient:
its error and Student's criterion are calculated

Spearman's rank correlation method allows you to determine the tightness (strength) and direction of the correlation between two features or two profiles (hierarchies) of features.

To calculate the rank correlation, it is necessary to have two series of values,

which can be ranked. These ranges of values ​​can be:

1) two signs measured in the same group of subjects;

2) two individual hierarchies of traits identified in two subjects for the same set of traits;

3) two group hierarchies of features,

4) individual and group hierarchies of features.

First, the indicators are ranked separately for each of the features.

As a rule, a lower value of a feature is assigned a lower rank.

In the first case (two features), the individual values ​​for the first feature, obtained by different subjects, are ranked, and then the individual values ​​for the second feature.

If two attributes are positively related, then subjects with low ranks in one of them will have low ranks in the other, and subjects with high ranks in

one of the traits will also have high ranks on the other trait. To calculate rs, it is necessary to determine the difference (d) between the ranks obtained by the given subject on both grounds. Then these indicators d are transformed in a certain way and subtracted from 1. Than

the smaller the difference between the ranks, the larger rs will be, the closer it will be to +1.

If there is no correlation, then all ranks will be mixed and there will be no

no match. The formula is designed so that in this case rs will be close to 0.

In the case of a negative correlation, the low ranks of the subjects on one attribute

will correspond to high ranks on another attribute, and vice versa. The greater the discrepancy between the ranks of subjects on two variables, the closer rs is to -1.

In the second case (two individual profiles), individual

the values ​​obtained by each of the 2 subjects according to a certain (the same for both of them) set of features. The first rank will receive the trait with the lowest value; the second rank is a feature with a higher value, and so on. Obviously, all features must be measured in the same units, otherwise ranking is impossible. For example, it is impossible to rank indicators according to the Cattell Personality Questionnaire (16PF), if they are expressed in "raw" scores, since the ranges of values ​​for different factors are different: from 0 to 13, from 0 to

20 and from 0 to 26. We cannot say which of the factors will take the first place in terms of severity until we bring all the values ​​to a single scale (most often this is the wall scale).

If the individual hierarchies of two subjects are positively related, then the features that have low ranks for one of them will have low ranks for the other, and vice versa. For example, if for one subject the factor E (dominance) has the lowest rank, then for another subject it should have a low rank, if one subject has factor C

(emotional stability) has the highest rank, then the other subject must also have

this factor has a high rank, and so on.

In the third case (two group profiles), the average group values ​​obtained in 2 groups of subjects are ranked according to a certain set of features that is the same for two groups. In what follows, the line of reasoning is the same as in the previous two cases.

In the case of the 4th (individual and group profiles), the individual values ​​of the subject and the mean group values ​​are ranked separately according to the same set of features that are obtained, as a rule, by excluding this individual subject - he does not participate in the mean group profile, with which he will be compared. individual profile. Rank correlation will allow you to check how consistent the individual and group profiles are.

In all four cases, the significance of the obtained correlation coefficient is determined by the number of ranked values ​​N. In the first case, this number will coincide with the sample size n. In the second case, the number of observations will be the number of features that make up the hierarchy. In the third and fourth cases, N is also the number of compared features, and not the number of subjects in the groups. Detailed explanations given in the examples. If the absolute value of rs reaches or exceeds a critical value, the correlation is significant.

Hypotheses.

There are two possible hypotheses. The first refers to case 1, the second to the other three cases.

The first version of hypotheses

H0: The correlation between variables A and B is not different from zero.

H1: The correlation between variables A and B is significantly different from zero.

The second version of the hypotheses

H0: Correlation between hierarchies A and B is not different from zero.

H1: The correlation between hierarchies A and B is significantly different from zero.

Limitations of the rank correlation coefficient

1. At least 5 observations must be submitted for each variable. The upper limit of the sample is determined by the available tables of critical values.

2. Spearman's rank correlation coefficient rs with a large number of identical ranks for one or both compared variables gives coarsened values. Ideally, both correlated series should be two sequences of mismatched values. If this condition is not met, it is necessary to make an adjustment for the same ranks.

Spearman's rank correlation coefficient is calculated by the formula:

If in both compared rank series there are groups of the same ranks, before calculating the rank correlation coefficient, it is necessary to make corrections for the same ranks Ta and Tv:

Ta \u003d Σ (a3 - a) / 12,

TV \u003d Σ (v3 - c) / 12,

where a is the volume of each group of identical ranks in the rank series A, c is the volume of each

groups of equal ranks in the rank series B.

To calculate the empirical value of rs, use the formula:

Calculation of Spearman's rank correlation coefficient rs

1. Determine which two characteristics or two characteristic hierarchies will participate in

comparison as variables A and B.

2. Rank the values ​​of the variable A, assigning rank 1 to the smallest value, in accordance with the ranking rules (see A.2.3). Enter the ranks in the first column of the table in order of the numbers of the subjects or signs.

3. Order the values ​​of the variable B, in accordance with the same rules. Enter the ranks in the second column of the table in order of the numbers of the subjects or signs.

5. Square each difference: d2. Enter these values ​​in the fourth column of the table.

Ta \u003d Σ (a3 - a) / 12,

TV \u003d Σ (v3 - c) / 12,

where a is the volume of each group of identical ranks in the rank row A; c - the volume of each group

the same ranks in the ranking series B.

a) in the absence of identical ranks

rs  1 − 6 ⋅

b) in the presence of the same ranks

Σd 2  T  T

r  1 − 6 ⋅ a in,

where Σd2 is the sum of squared differences between ranks; Ta and TV are corrections for the same

N is the number of subjects or features that participated in the ranking.

9. Determine from the Table (see Appendix 4.3) the critical values ​​of rs for a given N. If rs is greater than or at least equal to the critical value, the correlation is significantly different from 0.

Example 4.1. When determining the degree of dependence of the reaction of drinking alcohol on the oculomotor reaction in the test group, data were obtained before drinking alcohol and after drinking. Does the reaction of the subject depend on the state of intoxication?

Experiment results:

Before: 16, 13, 14, 9, 10, 13, 14, 14, 18, 20, 15, 10, 9, 10, 16, 17, 18. After: 24, 9, 10, 23, 20, 11, 12, 19, 18, 13, 14, 12, 14, 7, 9, 14. Let's formulate hypotheses:

H0: the correlation between the degree of dependence of the reaction before drinking alcohol and after drinking does not differ from zero.

H1: the correlation between the degree of dependence of the reaction before drinking alcohol and after drinking is significantly different from zero.

Table 4.1. Calculation d2 for rank coefficient Spearman rs correlations when comparing oculomotor response parameters before and after the experiment (N=17)

values

values

Since we have duplicate ranks, in this case we will apply the formula adjusted for the same ranks:

Ta= ((23-2)+(33-3)+(23-2)+(33-3)+(23-2)+(23-2))/12=6

Tb =((23-2)+(23-2)+(33-3))/12=3

Find the empirical value of the Spearman coefficient:

rs = 1- 6*((767.75+6+3)/(17*(172-1)))=0.05

According to the table (Appendix 4.3) we find the critical values ​​of the correlation coefficient

0.48 (p ≤ 0.05)

0.62 (p ≤ 0.01)

We get

rs=0.05∠rcr(0.05)=0.48

Conclusion: H1 hypothesis is rejected and H0 is accepted. Those. correlation between degree

dependence of the reaction before alcohol consumption and after does not differ from zero.