### Paired Data / Paired Analyses Gerard E. Dallal, Ph.D.

Introduction

Two measurements are paired when they come from the same observational unit: before and after, twins, husbands and wives, brothers and sisters, matched cases and controls. Pairing is determined by a study's design. It has nothing to do with the actual data values but, rather, with the way the data values are obtained. Observations are paired rather than independent when there is a natural link between an observation in one set of measurements and a particular observation in the other set of measurements, irrespective of their actual values.

The best way to determine whether data are paired is to identify the natural link between the two measurements. (Look for the link!) For example,

• when husbands and wives are studied, there is a natural correspondence between a man and his wife.
• When independent samples of men and women are studied, there's no particular female we associate with a particular male.

When measurements are paired, the pairing must be reflected in the analysis. The data cannot be analyzed as independent samples.

Why pair?

Pairing seeks to reduce variability in order to make more precise comparisons with fewer subjects. When independent samples are used, the difference between treatment means is compared to the variability of individual responses within each treatment group. This variability has two components:

• The larger component is usually the variability between subjects (between-subject variability). It's there because not every subject will respond the same way to a particular treatment. There will be variability between subjects.

• The other component is within-subject variability. This variability is present because even the same subject doesn't give exactly the same response each time s/he is measured. There will be variability within subjects.

When both measurements are made on the same subject, the between-subjects variability is eliminated from the comparison. The difference between treatments is compared to the way the difference changes from subject to subject. If this difference is roughly the same for each subject, small treatment effects can be detected even if different subjects respond quite differently.

If measurements are made on paired or matched samples, the between-subject variability will be reduced according to the effectiveness of the pairings. The pairing or matching need not be perfect. The hope is that it will reduce the between-subject variability enough to justify the effort involved in obtained paired data. If we are interested in the difference in dairy intake of younger and older women, we could take random samples of young women and older women (independent samples). However, we might interview mother/daughter pairs (paired samples), in the hope of removing some of the lifestyle and socioeconomic differences from the age group comparison. Sometimes pairing turns out to have been a good idea because variability is greatly reduced. Other times it turns out to be have been a bad idea, as is often the case with matched samples.

Pairing has no effect on the way the difference between two treatments is estimated. The estimate is the difference between the sample means, whether the data are paired or not. What changes is the uncertainty in the estimate.

Consider these data from an experiment in which subjects are assigned at random to one of two diets and their cholesterol levels are measured. Do the data suggest a real difference in the effect of the two diets? The values from Diet A look like they might be a bit lower, but this difference must be judged relative to the variability within each sample. One of your first reactions to looking at these data should be, "Wow! Look at how different the values are. There is so much variability in the cholesterol levels that these data don't provide much evidence for a real difference between the diets." And that response would be correct. With P = 0.47 and a 95% CI for A-B of (-21.3, 9.3) mg/dl), we could say only that diet A produces a mean cholesterol level that could be anywhere from 21 mg/dL less than that from diet B to 9 mg/dL more.

However, suppose you are now told that a mistake had been made. The numbers are correct, but the study was performed by having every subject consume both diets. The order of the diets was selected at random for each subject with a suitable washout period between diets. Each subject's cholesterol values are connected by a straight line in the diagram to the left.

Even though the mean difference is the same (6 mg/dl) we conclude the diets are certainly different because we now compare the mean difference of 6 to how much the individual differences vary. Each subject's cholesterol level on diet A is exactly 6 mg/dl less than on diet B! There is no question that there is an effect and that it is 6 mg/dl!

Paired data do not always result
in a paired analysis

Paired analyses are required when the outcome variable is measured on the same or matched units. If there is an opportunity for confusion, it is because paired data do not always result in paired outcomes, as the following example illustrates. Suppose an investigator compares the effects of two diets on cholesterol levels by randomizing subjects to one of the two diets and measuring their cholesterol levels at the start and end of the study. The primary outcome will be the change in cholesterol levels. Each subject's before and after measurements are paired because they are made on the same subject. However, the diets will be compared by looking at two independent samples of changes. If, instead, each subject had eaten both diets--that is, if there were two diet periods with a suitable washout between them and the order of diets randomized--a paired analysis would be required because both diets would have been studied on the same people.

The need for a paired analysis is established by the study design. If an investigator chooses to study husbands and wives rather than random samples of men and women, the data must be analyzed as paired outcomes regardless of whether the pairing was effective. Whenever outcome measures are paired or matched, they cannot be analyzed as independent samples.

Paired analyses comparing two population means are straightforward. Differences are calculated within each observational unit and the single sample of differences is examined. If the sample size is large, normal theory applies and the sample mean difference and population mean difference will be within two standard errors of the mean difference 95% of the time. If, by mistake, the data were treated as independent samples, the mean difference will be estimated properly but the amount of uncertainty against which it must be judged will be wrong. The uncertainty will usually be overstated, causing some real differences to be missed. However, although it is unlikely, it is possible for uncertainty to be understated, causing things to appear to be different even though the evidence is inadequate. Thus, criticism of an improper analysis cannot be dismissed by claiming that because an unpaired analysis shows a difference, the paired analysis will show a difference, too.

Pairing is usually optional. In most cases an investigator can choose to design a study that leads to a paired analysis or one that uses independent samples. The choice is a matter of tradeoffs between cost, convenience, and likely benefit. A paired study requires fewer subjects, but the subjects have to experience both treatments, which might prove a major inconvenience. Subjects with partial data usually do not contribute to the analysis. Also, when treatments must be administered in sequence rather than simultaneously, there are questions about whether the first treatment will affect the response to the second treatment (carry-over effect). In most cases, a research question will not require the investigator to take paired samples, but if a paired study is undertaken, a paired analysis must be used. That is, the analysis must always reflect the design that generated the data.

It is possible for pairing to be ineffective, that is, the variability of the difference between sample means can be about the same as what would have been obtained from independent samples. In general, matched studies in human subjects with matching by sex, age, BMI and the like are almost always a disaster. The matching is almost always impossible to achieve in practice (the subjects needed for the last few matches never seem to volunteer) and the efficiencies are rarely better than could be achieved by using statistical adjustment instead.

Examples -- Paired or Independent Analysis?
1. A hypothesis of ongoing clinical interest is that vitamin C prevents the common cold. In a study involving 20 volunteers, 10 are randomly assigned to receive vitamin C capsules and 10 are randomly assigned to receive placebo capsules. The number of colds over a 12 month period is recorded.

2. A topic of current interest in ophthalmology is whether or not spherical refraction is different between the left and right eyes. To examine this issue, refraction is measured in both eyes of 17 people.

3. In order to compare the working environment in offices where smoking is permitted with that in offices where smoking was not permitted, measurements were made at 2 p.m. in 40 work areas where smoking was permitted and 40 work areas was not permitted.

4. A question in nutrition research is whether male and female college students undergo different mean weight changes during their freshman year. A data file contains the September 1994 weight (lbs), May 1995 weight (lbs), and sex (1=male/2=female) of students from the class of 1998. The file is set up so that each record contains the data for one student. The first 3 records, for example, might be

 120 126 2 118 116 2 160 149 1

5. To determine whether cardiologists and pharmacists are equally knowledgeable about how nutrition and vitamin K affect anticoagulation therapy (to prevent clotting), an investigator has 10 cardiologists and 10 pharmacists complete a questionnaire to measure what they know. She contacts the administrators at 10 hospitals and asks the administrator to select a cardiologist and pharmacist at random from the hospital's staff to complete the questionnaire.

6. To determine whether the meals served on the meal plans of public and private colleges are equally healthful, an investigator chooses 7 public colleges and 7 private colleges at random from a list of all colleges in Massachusetts. On each day of the week, she visits one public college and one private college. She calculates the mean amount of saturated fat in the dinner entrees at each school.