    Chapter 4: Analysing the Data Part II : Descriptive Statistics # PearsonÕs Coefficient of Correlation (r)

The most common measure of "correlation" or "predictability" is PearsonÕs coefficient of correlation, although there are certainly many others. PearsonÕs r, as it is often symbolised, can have a value anywhere between -1 and 1. The larger r, ignoring sign, the stronger the association between the two variables and the more accurately you can predict one variable from knowledge of the other variable. At its extreme, a correlation of 1 or -1 means that the two variables are perfectly correlated, meaning that you can predict the values of one variable from the values of the other variable with perfect accuracy. At the other extreme, an r of zero implies an absence of a correlationÑthere is no relationship between the two variables. This implies that knowledge of one variable gives you absolutely no information about what the value of the other variable is likely to be. The sign of the correlation implies the "direction" of the association. A positive correlation means that relatively high scores on one variable are paired with relatively high scores on the other variable, and low scores are paired with relatively low scores. On the other hand, a negative correlation means that relatively high scores on one variable are paired with relatively low scores on the other variable

Output 4.3 Correlate í BivariateÉ

Correlations Output 4.3 A correlation matrix that results from using SPSS Correlate -> BivariateÉ

There are many different computationally laborious formulas for computing PearsonÕs r. However, we will not even try to use them. The SPSS output for computing a PearsonÕs correlation for the above data is shown in Output 4.3. Notice that at the entry in the matrix of correlations where the "CREATIVE" column and "REASON" row meet is the number ".736." This is Pearson correlation between REASON and CREATIVE. Underneath, p = ".000" is a test of a hypothesis about the "significance" of the correlation. WeÕll discuss this soon. Underneath, again, the "20" simply means the number of "cases" contributing to this correlationÑthe number of applicants in this study.

From this output you can read PearsonÕs r = .736. The correlation is positive, meaning that high scores on the creativity test tend to be paired with relatively high scores on the reasoning test, and vice versa. In other words, if a person scored high on the reasoning test, we would predict that the person scored relatively high on the creativity test, and if the person scored low on the reasoning test we would predict he or she also scored low on the creativity test. Of course, this wonÕt be true for every applicant (applicant # 7 for example), but it will tend to be true. This implies that you can predict creativity from logical reasoning, at least as the constructs are measured here. However, because of the less than perfect correlation there is a degree of uncertainty in our predictions. We will not be right all the time, there will be a certain amount of prediction error. All we can say is that given a high score on logical reasoning, there is a greater chance that the person will also be more highly creative.

When interpreting the size of a correlation, it is common to square it. When the correlation is squared (r2), we get a measure of how much of the variability in one variable can be "explained by" variation in the other. In this case, 0.7362 squared is about 0.54. We would therefore say that about 54% of the variability in creativity can be explained by (or "be accounted for by", or "is attributable to") differences in logical reasoning ability. There are no clear guidelines for determining how much variability explained is a "large" or "important" amount. In my opinion, this isnÕt really a statistical issue so much as it is a theoretical one. A large squared correlation may be trivial or not important in one context but very important or large in another context. It depends on the research question.

While the correlation coefficient, r, is an important measure of association between two variables, a couple of things about it need to be kept in mind. PearsonÕs r is a measure of linear association between two variables. That is, it is a quantification of how well the association is represented by a straight line (this will become clearer when we talk about regression). Two variables may be highly related to each other, but PearsonÕs r may be zero. For example, if both low and high scores on X tend to be paired with low scores on Y, but middle scores of X are paired with high scores on Y, the association is said to be curvilinear. In a scatterplot, the configuration of points in such a situation would look like an inverted "U." But the Pearson correlation between the two variables may be quite small because PearsonÕs r is a measure of how well the points fall on a straight line. For this reason, scatterplots are important supplements to statistical measures of association. If we found a correlation of, say, zero, between two variables, we might be tricked into thinking that the two variables are not associated with or related to each other, or that you canÕt predict one variable from the other. But a scatterplot might paint a different picture-one where the variables are associated with each other, but in a more complex way.

Restriction in range. Another important point is that PearsonÕs r is influenced by "restriction in range". What this means is that PearsonÕs r can shrink or expand, depending on how variable the scores on the variables tend to be. This is best illustrated by an example of how PearsonÕs r can shrink by decreasing the range of a variable. Suppose that the investigator decided that if a person scored less than 11 on the logical reasoning test, they were deemed simply too stupid to be in the study or not representative of the types of people who would tend to apply for the job. If you discard these applicants (there are nine of these) and re-compute the correlation, you get a much smaller value, r = .47 (N = 11). So the conclusions about the strength of the relationship between two variables can vary depending on the range of values observed on the variables.

Correlation does not necessarily mean causation. Two variables may be related to each other, but this doesnÕt mean that one variable causes the other. Just because logical reasoning and creativity as measured here are correlated, that doesnÕt mean that if we could increase peopleÕs logical reasoning ability, we would produce greater creativity. WeÕd need to conduct an actual experiment to unequivocally demonstrate a causal relationship. But if it is true that influencing someoneÕs logical reasoning ability does influence their creativity, then the two variables must be correlated with each other.

 © Copyright 2000 University of New England, Armidale, NSW, 2351. All rights reserved Maintained by Dr Ian Price Email: iprice@turing.une.edu.au