Chapter 4: Analysing the Data
To overcome the problem of dealing with squared units, statisticians take the square root of the variance to get the standard deviation.
The standard deviation (for a sample) is defined symbolically as
So if the scores in the data were 5, 7, 6, 1, and 8, their squared differences from the mean would be 0.16 (from [5-5.4]2), 2.56 (from [7-5.4]2), 0.36 (from [6-5.4]2), 19.36 (from [1-5.4]2), and 6.76 (from [8-5.4]2). The mean of these squared deviations is 5.84 and its square root is 2.41 (if dividing by N), which is the standard deviation of these scores. The standard deviation is defined as the average amount by which scores in a distribution differ from the mean, ignoring the sign of the difference. Sometimes, the standard deviation is defined as the average distance between any score in a distribution and the mean of the distribution.
The above formula is the definition for a sample standard deviation. To calculate the standard deviation for a population, N is used in the denominator instead of N-1. Suffice it to say that in most contexts, regardless of the purpose of your data analysis, computer programs will print the result from the sample sd. So we will use the second formula as our definitional formula for the standard deviation, even though conceptually dividing by N makes more sense (i.e., dividing by how many scores there are to get the average). When N is fairly large, the difference between the different formulas is small and trivial. Using the N-1 version of the formula, we still define the standard deviation as the average amount by which scores in a distribution differ from the mean, ignoring the sign of the difference, even though this isnŐt a true average using this formula.
The standard deviation in our sexual behaviour data is 2.196, from the SPSS printout in Output 4.2. So the mean number of sexual partners is 1.864 with a standard deviation of 2.196. The units are now the same as the original data. But, is this a large standard deviation? It is hard to say. In a normal distribution the mean and standard deviation are independent of each other. That is one could be large or small and the other large or small without any influence on each other. However, in reality they are often linked so that larger, means tend to have larger standard deviations. This leads into the area of transformations that are a way of reestablishing this independence.
A useful measure of a distribution that is sometimes used is the ratio of the standard deviation to the mean (Howell p. 48)
The standard deviation has one undesirable feature. Like the mean, one or two extreme scores easily influence the standard deviation. So really atypical scores in a distribution ("outliers") can wildly change the distributionŐs standard deviation. Here, adding a score of 200 increases the sd from 2.196 to 15.0115, a seven-fold increase! Because both of these descriptive statistics are influenced by extreme cases, it is important to note when extreme values exist in your data and might be influencing your statistics. How to define "extreme," and what to do if you have extreme data points is a controversial and complex topic out of the scope of this class.
© Copyright 2000 University of New England, Armidale, NSW, 2351. All rights reserved
Maintained by Dr Ian Price