Chapter 4: Analysing the Data Part II : Descriptive Statistics

# Correlation and regression

So far we have been discussing a number of descriptive techniques for describing one variable only. However, a very important part of statistics is describing the relationship between two (or more) variables. One of the most fundamental concepts in research, whether in psychological science or some other field, is the concept of correlation. If two variables are correlated, this means that you can use information about one variable to predict the values of the other variable.

Despite the apparently ever-changing view of the environment, when you look a little deeper, there are some things that do not change that much or change in a systematic and predictable way (sometimes referred to as invariant relationships). In some areas of science we can predict things quite accurately indeed. The movements of the planets, and the moons around the planets, can be predicted quite accurately centuries into the future from our current knowledge of orbits and physics. Alas, in the behavioural sciences things are not quite as predictable. We can only talk about an increased likelihood or a decreased likelihood of something happening. We try to make predictions in a much more uncertain environment. But we can still discover relationships and predict an increased likelihood or a decreased likelihood of something happening, given that something else has happened.

We now accept that more smoking goes with more ill-health in later life, that exposure to certain chemicals such as asbestos goes with a greater likelihood of later cancer, exposure to childhood abuse and neglect goes with later increased likelihood of crime and substance use, greater openness and communication goes with a more satisfying and prolonged marriage, an early ability to delay gratification goes with greater social competence in later life, a greater social competence goes with an enhanced ability to lie (!). Discoveries such as these always start with the repeated observation that two things in the environment are correlated. For this reason, it is worth getting a firm grasp on the concept of correlation and the related statistical procedure called linear regression.

To illustrate correlation and regression, consider a simple study. Twenty people applying for a job as a graphic designer were given two tests, one measuring the applicant's logical reasoning ability, the other measuring the applicant's creativity. The logical reasoning test is quite simple to administer, is very inexpensive, and can be completed in only 15 minutes. The creativity test, however, is quite lengthy and difficult to administer. It takes two full days to give it and another 2 days to score the results. Because someone has to be paid to administer and score the test, it is also quite expensive.

Ultimately, the employer who sponsored this study wants to hire the most creative people he or she can find, but selecting people by using this creativity test is simply too expensive and impractical. Of interest is whether scores on the logical reasoning test can be used to select the most creative people. In other words, the employer wants to know whether you can predict a person's creativity (operationalised as the score on the creativity test) from information about the person's logical reasoning ability (operationalised as the score on the logical reasoning test). If the two variables are correlated with each other, this means that in the future, the employer perhaps need not administer the creativity test to applicants. The applicants will only need to take the logical reasoning task, and those scores can be used to predict what the person's creativity is likely to be. From that information, a hiring decision can be made without spending lots of money assessing the creativity of every applicant. No wonder the employer is willing to sponsor and pay for such a study. The knowledge coming from this study could be very useful to the employer and could possibly save the company big bucks!

The data for the 20 applicants was shown previously in Figure 4.8. The independent variable (X, or "REASON") is the score on the logical reasoning task. The dependent variable (Y or "CREATIVE") is the score on the creativity test.

Examining the table, you can see that, for example, applicant #8 scored 17 on the reasoning test, and 16 on the creativity test. This pairing of individualsÕ X and Y data is all we need to measure the association or correlation between the variables, or whether X can be used to predict Y in some way.