Chapter 6: Analysing the Data
Testing the significance of Pearson's r
We have looked at Pearson's r as a useful descriptor of the degree of linear association between two variables, and learned that it has two key properties of magnitude and direction. When it is near zero, there is no correlation, but as it approaches Ð1 or +1 there is a strong negative or positive relationship between the variables (respectively). But how do we know when a correlation is sufficiently different to zero to assert that a real relationship exists?
What we need is some estimate of how much variation in r we can expect just by random chance. That is, we need to construct a sampling distribution for r and determine its standard error. All variables are correlated to some extent; rarely will a correlation be exactly zero. What we need is to be able to draw a line, that tells us that above that line a correlation will be considered as a real correlation and below that line the correlation will be considered as probably due to chance alone.
From the following illustrations, we can see how for small N's, r can vary markedly even when the null hypothesis is true (i.e., if chance is a reasonable explanation for the correlation). For larger sample sizes, the correlations will cluster more tightly around zero but there will still be a considerable range of values. The illustrations that follow show the distribution of the correlation coefficient between an X and a Y variable for 100 random samples simulated on a computer, using N's of different sizes. When N=5, we can see almost the full range from Ð1 to +1. This is less apparent for N=10, 20, 30 and so on, until when we have simulated 100 cases there is little variability around zero.
Figure 6.1 The distribution of correlations between two random variables when sample size = 5
Figure 6.2 The distribution of correlations between two random variables when sample size = 10.
Figure 6.3 The distribution of correlations between two random variables when sample size = 20.
Figure 6.4 The distribution of correlations between two random variables when sample size = 100.
From the above figures, you can see that as the sample size increases, the more the correlations tend to cluster around zero. Because we have just correlated two random variables there should be no systematic or real relationship between the two. However just by chance you will get some that appear real.
If we look more carefully at the above figures for r = -.65, we see that in Figure 6.1 with samples of size 5, 18 out of the 100 samples had a correlation equal to or more extreme than -.65 (4 at -.95, 5 at -.85, 2 at -.75, and 7 at -.65). In Figure 6.2 with samples of size 10, only 6 samples had a correlation of -.65 or more extreme. And in Figure 6.3 and Figure 6.4 with samples of size 20 and 100, there were no samples having a correlation of -.65 or more extreme. So a correlation of -.65 is not an unusual score if the samples are only small. However, correlations of this size are quite rare when we use samples of size 20 or more.
The following table gives the significance levels for Pearson's correlation using different sample sizes.
Table D. Critical values for Pearson r
© Copyright 2000 University of New England, Armidale, NSW, 2351. All rights reserved
Maintained by Dr Ian Price