Chapter 5: Analysing the Data Part II : Inferential Statistics

# Independence of Observations

There are two ways of thinking about "independence of observations". One is whether you measure each individual once only (a 'between' or 'independent' groups design) or several times (a 'within' or Ôrepeated measures' design) and the other refers to the independence of observations within any particular group. The first of these forms of independence is usually a deliberate design choice for which there are several advantages (and some problems, of course).

The second type of independence, however, is more subtle. If you were to accidentally include the same subject several times, or you entered the same data values into a computer more than once, or, if our subjects tell future subjects something about the experiment, then we have a form of non-independence. Each subject's data may not be independent of other subject's data.

While independence is a requirement of the statistical techniques, the means of dealing with it are not; instead, the independence requirement places constraints on the conduct of the experiment itself. The remedy for independence problems is a procedural matter and is really a part of an experiment's internal validity.

The best way to deal with the independence of observations is by careful research technique. Once you understand that each observation on a variable should be independent of every other, reasonable experimental control should assure that the condition is met. This is not to say, of course, that you can never have more than one observation on each subject in an experiment; you certainly can. You may measure each subject on a large number of variables, but within each variable, each subject contributes only a single observation. And you can measure the same variable repeatedly for each subject; researchers in the field of learning would be in serious trouble if they could not measure, for example, the number of items recalled on each of a series of experimental trials. In both of these two examples, the research design is explicitly planned to allow a particular form of non-independence of observations, and the analysis procedures are specifically designed for repeated measures.

The effect of violating any of the assumptions is a change in the probability of making a Type I or a Type II error, and you won't usually know whether the change has made you more, or less, likely to commit an inferential error. When the problem is in the level of measurement, the effect of using a statistic that isn't meaningful for the measurement is to make any conclusions at least highly suspect, if not completely meaningless.

 The concluding advice, then, is to use any available descriptive statistics in order to fully understand the data. When it comes to making inferences, both parametric and nonparametric procedures can be applied, with the parametric ones usually given first attention. Nonparametric procedures deserve careful consideration when there is reason to worry about parametric assumptions, or when measurement considerations demand them.