The term "correlation" is actively used in humanities, medicine; frequently featured in the media. Correlations play a key role in psychology. In particular, the calculation of correlations is milestone implementation of empirical research when writing a WRC in psychology.

Correlation stuff on the web is too scientific. It is difficult for a non-specialist to understand the formulas. At the same time, understanding the meaning of correlations is necessary for a marketer, sociologist, physician, psychologist - everyone who conducts research on people.

In this article we plain language we will explain the essence of the correlation, types of correlations, methods of calculation, features of the use of correlation in psychological research, as well as when writing theses in psychology.


What is correlation

Correlation is communication. But not any. What is its peculiarity? Let's look at an example.

Imagine that you are driving a car. You press the gas pedal - the car goes faster. You slow down the gas - the car slows down. Even a person who is not familiar with the device of a car will say: “There is a direct relationship between the gas pedal and the speed of the car: the harder the pedal is pressed, the higher the speed.”

This dependence is functional - the speed is a direct function of the gas pedal. The specialist will explain that the pedal controls the supply of fuel to the cylinders, where the combustion of the mixture occurs, which leads to an increase in power to the shaft, etc. This connection is rigid, deterministic, not allowing exceptions (provided that the machine is working).

Now imagine that you are the director of a company whose employees sell goods. You decide to increase sales by raising employees' salaries. You raise your salary by 10%, and the company's average sales go up. After a while, you increase by another 10%, and again growth. Then another 5%, and again there is an effect. The conclusion suggests itself - there is a direct relationship between the sales of the company and the salary of employees - the higher the salaries, the higher the sales of the organization. Is this the same connection as between the gas pedal and the speed of the car? What is the key difference?

That's right, the relationship between salary and sales is not rigid. This means that for some of the employees, sales could even decline, despite the increase in salary. Somebody's got to stay the same. But on average, sales have grown in the company, and we say that there is a relationship between sales and employee salaries, and it is correlated.

At the core functional connection(gas pedal - speed) is a physical law. The correlation (sales - salary) is based on a simple consistency of changes in two indicators. There is no law (in the physical sense of the word) behind correlation. There is only a probabilistic (stochastic) regularity.

Numerical expression of correlation dependence

So, the correlation reflects the dependence between phenomena. If these phenomena can be measured, then it receives a numerical expression.

For example, the role of reading in people's lives is being studied. The researchers took a group of 40 people and measured two indicators for each subject: 1) how much time he reads per week; 2) to what extent he considers himself successful (on a scale from 1 to 10). The researchers plotted the data in two columns and used a statistical program to calculate the correlation between reading and well-being. Suppose they got the following result -0.76. But what does this number mean? How to interpret it? Let's figure it out.

The resulting number is called the correlation coefficient. For its correct interpretation, it is important to consider the following:

  1. The sign "+" or "-" reflects the direction of dependence.
  2. The value of the coefficient reflects the strength of the dependence.

Direct and reverse

The plus sign in front of the coefficient indicates that the relationship between phenomena or indicators is direct. That is, the greater one indicator, the greater the other. Higher salary means higher sales. Such a correlation is called direct, or positive.

If the coefficient has a minus sign, then the correlation is inverse, or negative. In this case, the higher one indicator, the lower the other. In the reading and well-being example, we got -0.76, which means that the more people read, the lower their level of well-being.

Strong and weak

Correlation in numerical terms is a number in the range from -1 to +1. Denoted by the letter "r". The higher the number (ignoring the sign), the stronger the correlation.

The lower the numerical value of the coefficient, the less the relationship between phenomena and indicators.

The maximum possible dependency strength is 1 or -1. How to understand and present it?

Consider an example. They took 10 students and measured their level of intelligence (IQ) and academic performance for the semester. Arranged this data in two columns.

test subject


Progress (points)

Look closely at the data in the table. From 1 to 10 of the test subject, the IQ level increases. But the level of achievement is also rising. Of any two students, the one with the higher IQ will perform better. And there will be no exceptions to this rule.

Before us is an example of a complete, 100% coordinated change in two indicators in a group. And this is an example of the maximum possible positive relationship. That is, the correlation between intelligence and performance is 1.

Let's consider another example. The same 10 students were assessed with the help of a survey to what extent they feel successful in communicating with the opposite sex (on a scale from 1 to 10).

test subject


Success in communicating with the opposite sex (points)

We look closely at the data in the table. From 1 to 10 of the test subject, the IQ level increases. At the same time, the level of success in communication with the opposite sex consistently decreases in the last column. Of any two students, the one with the lower IQ will be more successful in communicating with the opposite sex. And there will be no exceptions to this rule.

This is an example of complete consistency in the change of two indicators in the group - the maximum possible negative relationship. The correlation between IQ and the success of communication with the opposite sex is -1.

How to understand the meaning of correlation zero(0)? This means that there is no relationship between the indicators. Once again, let's return to our students and consider another indicator measured by them - the length of the jump from a place.

test subject


Standing jump length (m)

There is no consistency between person-to-person variation in IQ and long jump. This indicates a lack of correlation. The correlation coefficient of IQ and jump length for students is 0.

We've looked at extreme cases. In real measurements, the coefficients are rarely equal to exactly 1 or 0. In this case, the following scale is adopted:

  • if the coefficient is greater than 0.70 - the relationship between the indicators is strong;
  • from 0.30 to 0.70 - the connection is moderate,
  • less than 0.30 - the connection is weak.

If we evaluate on this scale the correlation we obtained above between reading and well-being, it turns out that this dependence is strong and negative -0.76. That is, there is a strong negative relationship between erudition and well-being. Which once again confirms the biblical wisdom about the relationship between wisdom and sorrow.

The given gradation gives very rough estimates and is rarely used in research in this form.

Gradations of coefficients according to significance levels are more often used. In this case, the actual coefficient obtained may be significant or not significant. This can be determined by comparing its value with the critical value of the correlation coefficient taken from a special table. Moreover, these critical values ​​depend on the size of the sample (the larger the volume, the lower critical value).

Correlation analysis in psychology

The correlation method is one of the main ones in psychological research. And this is not accidental, because psychology strives to be an exact science. Does it work?

What is the peculiarity of laws in the exact sciences. For example, the law of gravity in physics operates without exception: the greater the mass of a body, the stronger it attracts other bodies. This physical law reflects the relationship between body mass and gravity.

In psychology, the situation is different. For example, psychologists publish data on the relationship of warm relationships in childhood with parents and the level of creativity in adulthood. Does this mean that any of the subjects with a very warm relationship with their parents in childhood will have very high Creative skills? The answer is unequivocal - no. There is no law like the physical one. No mechanism of influence childhood experience on adult creativity. These are our fantasies! There is data consistency (relationships - creativity), but there is no law behind them. But there is only correlation. Psychologists often refer to the identified relationships as psychological patterns, emphasizing their probabilistic nature - not rigidity.

The student study example from the previous section illustrates well the use of correlations in psychology:

  1. Analysis of the relationship between psychological indicators. In our example, IQ and the success of communication with the opposite sex are psychological parameters. Identification of the correlation between them expands the understanding of the mental organization of a person, of the relationship between various aspects of his personality - in this case, between the intellect and the sphere of communication.
  2. Analysis of the relationship of IQ with academic performance and jumping is an example of the relationship of a psychological parameter with non-psychological ones. The results obtained reveal the features of the influence of intelligence on educational and sports activities.

Here's what a summary of the results of a fictional study on students could look like:

  1. A significant positive relationship between the intelligence of students and their academic performance was revealed.
  2. There is a negative significant relationship between IQ and successful communication with the opposite sex.
  3. There was no connection between the IQ of students and the ability to jump from a place.

Thus, the level of intelligence of students acts as a positive factor in their academic performance, while at the same time negatively affecting relationships with the opposite sex and not having a significant impact on sports success, in particular, the ability to jump from a place.

As you can see, the intellect helps students to learn, but prevents them from building relationships with the opposite sex. This does not affect their athletic performance.

The ambiguous influence of intelligence on the personality and activity of students reflects the complexity of this phenomenon in the structure of personality traits and the importance of continuing research in this direction. In particular, it seems important to analyze the relationship between intelligence and psychological characteristics and activities of students, taking into account their gender.

Pearson and Spearman coefficients

Let's consider two calculation methods.

The Pearson coefficient is a special method for calculating the relationship of indicators between the severity of numerical values ​​in one group. Very simplified, it boils down to this:

  1. The values ​​of two parameters in the group of subjects are taken (for example, aggression and perfectionism).
  2. The average values ​​of each parameter in the group are found.
  3. The differences between the parameters of each subject and the average value are found.
  4. These differences are substituted into special form to calculate the Pearson coefficient.

Coefficient rank correlation Spearman is calculated in a similar way:

  1. The values ​​of two indicators in the group of subjects are taken.
  2. The ranks of each factor in the group are found, that is, the place in the list in ascending order.
  3. The rank differences are found, squared and summed.
  4. Next, the rank differences are substituted into a special form to calculate the Spearman coefficient.

In Pearson's case, the calculation was based on the average value. Therefore, random data outliers (significant difference from the mean), for example, due to processing error or unreliable answers, can significantly distort the result.

In Spearman's case absolute values data do not play a role, since only their relative position in relation to each other (ranks) is taken into account. That is, data outliers or other inaccuracies will not seriously affect the final result.

If the test results are correct, then the differences between the Pearson and Spearman coefficients are insignificant, while the Pearson coefficient shows more exact value data relationships.

How to Calculate the Correlation Coefficient

The Pearson and Spearman coefficients can be calculated manually. This may be necessary for an in-depth study of statistical methods.

However, in most cases, when solving applied problems, including in psychology, it is possible to carry out calculations using special programs.

Calculation using Microsoft Excel spreadsheets

Let's go back to the students example and look at the data on their level of intelligence and the length of the jump from a place. Let's enter this data (two columns) into an Excel spreadsheet.

After moving the cursor to an empty cell, press the "Insert Function" option and select "CORREL" from the "Statistical" section.

The format of this function assumes the selection of two data arrays: CORREL(array 1; array"). We highlight the column with IQ and the length of the jumps, respectively.

V Excel tables the formula for calculating only the Pearson coefficient is implemented.

Calculation with the program STATISTICA

We enter data on intelligence and the length of the jump in the field of initial data. Next, select the option "Nonparametric criteria", "Spearman". Select the parameters for the calculation and get the following result.

As you can see, the calculation gave a result of 0.024, which differs from the Pearson result - 0.038, obtained above using Excel. However, the differences are minor.

Using correlation analysis in psychology theses (example)

Most of the topics of final qualification works in psychology (diplomas, term papers, master's) involve a correlation study (the rest are related to identifying differences in psychological indicators in different groups).

The very term "correlation" in the titles of topics rarely sounds - it is hidden behind the following wording:

  • "The relationship between subjective feelings of loneliness and self-actualization in women of mature age";
  • “Peculiarities of the influence of the resilience of managers on the success of their interaction with clients in conflict situations”;
  • "Personal factors of stress resistance of employees of the Ministry of Emergency Situations."

Thus, the words "relationship", "influence" and "factors" are sure signs that the method of data analysis in empirical research should be correlation analysis.

Let us briefly consider the stages of its implementation when writing a thesis in psychology on the topic: "The relationship of personal anxiety and aggressiveness in adolescents."

1. For the calculation, raw data are required, which are usually the test results of the subjects. They are entered into a pivot table and placed in the application. This table is structured as follows:

  • each line contains data for one subject;
  • each column contains scores on one scale for all subjects.

subject number

Personal anxiety


2. It is necessary to decide which of the two types of coefficients - Pearson or Spearman - will be used. Recall that Pearson gives a more accurate result, but it is sensitive to outliers in the data. Spearman coefficients can be used with any data (except for the nominative scale), which is why they are most often used in psychology diplomas.

3. We enter the table of raw data into the statistical program.

4. Calculate the value.

5. The next step is to determine if the relationship is significant. The statistical program highlighted the results in red, which means that the correlations are statistically significant at a significance level of 0.05 (indicated above).

However, it is useful to know how to determine the significance manually. To do this, you need Spearman's critical values ​​table.

Table of critical values ​​of the Spearman coefficients

Level of statistical significance

Number of test subjects

































We are interested in the significance level of 0.05 and the size of our sample of 10 people. At the intersection of these data, we find the value of the critical Spearman: Rcr=0.63.

The rule is this: if the Spearman empirical value obtained is greater than or equal to the critical value, then it is statistically significant. In our case: Remp (0.66) > Rcr (0.63), therefore, the relationship between aggressiveness and anxiety in the adolescent group is statistically significant.

5. In the text of the thesis, you need to insert data in a word format table, and not a table from a statistical program. Below the table, we describe the result obtained and interpret it.

Table 1

Spearman's coefficients of aggressiveness and anxiety in a group of adolescents


Personal anxiety


* - statistically significant (p0,05)

Analysis of the data presented in Table 1 shows that there is a statistically significant positive relationship between the aggressiveness and anxiety of adolescents. This means that the higher the personal anxiety of adolescents, the higher the level of their aggressiveness. This result suggests that aggression for adolescents is one of the ways to relieve anxiety. Experiencing self-doubt, anxiety due to threats to self-esteem, especially sensitive in adolescence, a teenager often uses aggressive behavior, in such an unproductive way to reduce anxiety.

6. Is it possible to talk about influence when interpreting relationships? Can we say that anxiety affects aggressiveness? Strictly speaking, no. We have shown above that the correlation between phenomena is of a probabilistic nature and reflects only the consistency of changes in characteristics in a group. At the same time, we cannot say that this consistency is caused by the fact that one of the phenomena is the cause of the other, affects it. That is, the presence of a correlation between psychological parameters does not give grounds to talk about the existence of a causal relationship between them. However, practice shows that the term "influence" is often used when analyzing the results correlation analysis.

In practice, Spearman's rank correlation coefficient (P) is often used to determine the closeness of the relationship between two features. The values ​​of each feature are ranked in ascending order (from 1 to n), then the difference (d) between the ranks corresponding to one observation is determined.

Example #1. The relationship between the volume of industrial production and investments in fixed capital in 10 areas of one of federal districts RF in 2003 is characterized by the following data.
Calculate Spearman's rank correlation coefficients and Kendala. Check their significance at α=0.05. Formulate a conclusion about the relationship between the volume of industrial production and investments in fixed assets in the regions of the Russian Federation under consideration.

Assign ranks to the feature Y and the factor X . Find the sum of the difference of squares d 2 .
Using the calculator, we calculate the Spearman's rank correlation coefficient:

X Y rank X, dx rank Y, d y (dx - dy) 2
1.3 300 1 2 1
1.8 1335 2 12 100
2.4 250 3 1 4
3.4 946 4 8 16
4.8 670 5 7 4
5.1 400 6 4 4
6.3 380 7 3 16
7.5 450 8 5 9
7.8 500 9 6 9
17.5 1582 10 16 36
18.3 1216 11 9 4
22.5 1435 12 14 4
24.9 1445 13 15 4
25.8 1820 14 19 25
28.5 1246 15 10 25
33.4 1435 16 14 4
42.4 1800 17 18 1
45 1360 18 13 25
50.4 1256 19 11 64
54.8 1700 20 17 9

The relationship between feature Y factor X is strong and direct.

Estimation of Spearman's rank correlation coefficient

According to the Student's table, we find Ttable.
T table \u003d (18; 0.05) \u003d 1.734
Since Tobs > Ttabl, we reject the hypothesis that the rank correlation coefficient is equal to zero. In other words, Spearman's rank correlation coefficient is statistically significant.

Interval estimate for rank correlation coefficient (confidence interval)
Confidence interval for Spearman's rank correlation coefficient: p(0.5431;0.9095).

Example #2. Initial data.

5 4
3 4
1 3
3 1
6 6
2 2
Since the matrix has related ranks (the same rank number) of the 1st row, we will reshape them. The ranks are re-formed without changing the importance of the rank, that is, the corresponding ratios (greater than, less than or equal to) must be preserved between the rank numbers. It is also not recommended to set the rank above 1 and below the value equal to the number of parameters (in this case n = 6). Reformation of ranks is made in table.
New ranks
1 1 1
2 2 2
3 3 3.5
4 3 3.5
5 5 5
6 6 6
Since there are bound ranks of the 2nd row in the matrix, we will reshape them. Reformation of ranks is made in table.
Seat numbers in ordered rowLocation of factors according to the expert's assessmentNew ranks
1 1 1
2 2 2
3 3 3
4 4 4.5
5 4 4.5
6 6 6
Rank matrix.
rank X, dxrank Y, d y(dx - dy) 2
5 4.5 0.25
3.5 4.5 1
1 3 4
3.5 1 6.25
6 6 0
2 2 0
21 21 11.5
Since among the values ​​of features x and y there are several identical ones, i.e. bound ranks are formed, then in this case the Spearman coefficient is calculated as:


j - numbers of links in order for feature x;
And j is the number of identical ranks in j-th bundle by x;
k - numbers of sheaves in order for feature y;
In k - the number of identical ranks in the k-th bundle in y.
A = [(2 3 -2)]/12 = 0.5
B = [(2 3 -2)]/12 = 0.5
D = A + B = 0.5 + 0.5 = 1

The relationship between feature Y and factor X is moderate and direct.

There are two possible hypotheses. The first refers to case 1, the second to the other three cases.

The first version of hypotheses

H0: The correlation between variables A and B is not different from zero.

H1: The correlation between variables A and B is significantly different from zero.

The second version of the hypotheses

H0: Correlation between hierarchies A and B is not different from zero.

H1: The correlation between hierarchies A and B is significantly different from zero.

Limitations of the rank correlation coefficient

1. At least 5 observations must be submitted for each variable. The upper limit of the sample is determined by the available tables of critical values.

2. Spearman's rank correlation coefficient rs with a large number of identical ranks for one or both compared variables gives coarsened values. Ideally, both correlated series should be two sequences of mismatched values. If this condition is not met, it is necessary to make an adjustment for the same ranks.

Spearman's rank correlation coefficient is calculated by the formula:

If in both compared rank series there are groups of the same ranks, before calculating the rank correlation coefficient, it is necessary to make corrections for the same ranks Ta and Tv:

Ta \u003d Σ (a3 - a) / 12,

TV \u003d Σ (v3 - c) / 12,

where a is the volume of each group of identical ranks in the rank series A, c is the volume of each

groups of equal ranks in the rank series B.

To calculate the empirical value of rs, use the formula:

Calculation of Spearman's rank correlation coefficient rs

1. Determine which two characteristics or two characteristic hierarchies will participate in

comparison as variables A and B.

2. Rank the values ​​of the variable A, assigning rank 1 to the smallest value, in accordance with the ranking rules (see A.2.3). Enter the ranks in the first column of the table in order of the numbers of the subjects or signs.

3. Order the values ​​of the variable B, in accordance with the same rules. Enter the ranks in the second column of the table in order of the numbers of the subjects or signs.

5. Square each difference: d2. Enter these values ​​in the fourth column of the table.

Ta \u003d Σ (a3 - a) / 12,

TV \u003d Σ (v3 - c) / 12,

where a is the volume of each group of identical ranks in the rank row A; c - the volume of each group

the same ranks in the ranking series B.

a) in the absence of identical ranks

rs  1 − 6 ⋅

b) in the presence of the same ranks

Σd 2  T  T

r  1 − 6 ⋅ a in,

where Σd2 is the sum of squared differences between ranks; Ta and TV are corrections for the same

N is the number of subjects or features that participated in the ranking.

9. Determine from the Table (see Appendix 4.3) the critical values ​​of rs for a given N. If rs is greater than or at least equal to the critical value, the correlation is significantly different from 0.

Example 4.1. When determining the degree of dependence of the reaction of drinking alcohol on the oculomotor reaction in the test group, data were obtained before drinking alcohol and after drinking. Does the reaction of the subject depend on the state of intoxication?

Experiment results:

Before: 16, 13, 14, 9, 10, 13, 14, 14, 18, 20, 15, 10, 9, 10, 16, 17, 18. After: 24, 9, 10, 23, 20, 11, 12, 19, 18, 13, 14, 12, 14, 7, 9, 14. Let's formulate hypotheses:

H0: the correlation between the degree of dependence of the reaction before drinking alcohol and after drinking does not differ from zero.

H1: the correlation between the degree of dependence of the reaction before drinking alcohol and after drinking is significantly different from zero.

Table 4.1. Calculation d2 for rank coefficient Spearman rs correlations when comparing oculomotor response parameters before and after the experiment (N=17)



Since we have duplicate ranks, in this case we will apply the formula adjusted for the same ranks:

Ta= ((23-2)+(33-3)+(23-2)+(33-3)+(23-2)+(23-2))/12=6

Tb =((23-2)+(23-2)+(33-3))/12=3

Find the empirical value of the Spearman coefficient:

rs = 1- 6*((767.75+6+3)/(17*(172-1)))=0.05

According to the table (Appendix 4.3) we find the critical values ​​of the correlation coefficient

0.48 (p ≤ 0.05)

0.62 (p ≤ 0.01)

We get


Conclusion: H1 hypothesis is rejected and H0 is accepted. Those. correlation between degree

dependence of the reaction before alcohol consumption and after does not differ from zero.