Intention-To-Treat Analysis
Gerard E. Dallal, Ph.D.
Chu-chih [Gutei] Raises One Finger : The Gateless Barrier Case 3 Whenever Chu-chih was asked a question, he simply raised one finger. One day a visitor asked Chu-chih's attendant what his master preached. The boy raised a finger. Hearing of this, Chu-chih cut off the boy's finger with a knife. As he ran from the room, screaming with pain, Chu-chih called to him. When he turned his head, Chu-chih raised a finger. The boy was suddenly enlightened.
When Chu-chih was about to die, he said to his assembled monks: "I received this one-finger Zen from T'ien-lung. I used it all my life but never used it up." With this he entered his eternal rest.
[If it's a new academic year, then I am almost certainly revising this note about Intention-To-Treat Analysis. This isn't that revision. You don't get to see the "official" 2006 version. This one comes two versions later!
I thought I finally had the issues clear in my mind. I revised this note. I gave my lecture. But thinking about it afterward, I realized I was uncomfortable with much of what I said. I thought I'd become more accepting of ITT, but I realize I'm every bit as skeptical, if not more so, as when I first thought about it. I then revised this note for the second time. I decided that the best thing to do was present the issues in as disinterested a fashion as I could and end with a "My Views" section.
I did that, but something kept gnawing at me. Here's the bottom line:
Intention-To-Treat Analysis is a
FRAUD!
There is no inconsistency here. As a trip to the
Wayback Machine clearly demonstrates, there has been one constant
throughout this entire series of notes, which I quote from the earliest
version captured, back in 2002:
The proper approach is to ignore labels, understand the
research question, and perform the proper analysis whatever it's
called !
Let's get on with it...]
Let's start with some terminology. In theory, there are two broad types of analyses:
The discussion of Intention-To-Treat is often complicated by the early introduction of issues involving missing data. It's natural to want to talk about missing data because ITT analyses often involve data that are missing because subjects drop out of the study or are lost to followup. However, missing data complicate the discussion and have the potential to confuse the issues. Subjects can be nonadherent without having any of their data missing, so we'll begin by assuming that the data are complete and only adherence is at issue. The topic of Missing Data will be discussed in its own section later.
With that out of the way...
Often, researchers will argue that an analysis should not include subjects who haven't followed the protocol. At first blush, this may seem reasonable, especially in a randomized, controlled trial where the expectation is that randomized controls will insure the validity of the study. However, there is always a possibility that bias will be introduced due to differential dropouts or, to put it in a less technical way,
Consider two weight loss diets, one of which is effective while the other isn't.
It is now commonplace, if not standard practice, to see study sponsors and funding agencies specify that study data be subjected to an Intention-To-Treat (ITT) analysis with "followup and case ascertainment continued regardless of whether participants continued in the trial". Regardless means
Investigators do everything they can to insure that the data are complete, especially with regard to the primary outcome measure.
There are four major lines of justification for intention-to-treat analysis.
Dealing with questionable outcomes and guarding against conscious or unconscious introductions of bias
Paul Meier (of Kaplan-Meier fame), then of the University of Chicago, offered an example involving a subject in a heart disease study where there is a question of whether his death should be counted against his treatment or set aside. The subject disappeared after falling off his boat. He had been observed carrying two six-packs of beer on board before setting off alone. Meier argues that most researchers would set this event aside as unrelated to the treatment, while intention-to-treat would require the death be counted against the treatment. But suppose, Meier continues, that the beer is eventually recovered and every can is unopened. Intention-to-treat does the right thing in any case. By treating all events the same way, deaths unrelated to treatment should be equally likely to occur in all groups and the worst that can happen is that the treatment effects will be watered down by the occasional, randomly occurring outcome unrelated to treatment. If we pick and choose which events should count, we risk introducing bias into our estimates of treatment effects.
Guarding against informative dropouts
This was illustrated by the introductory example involving two weight loss diets, where the effective diet looked worse than it really was because the only subjects following the ineffective diet who remained in the study were those losing weight.
Preserving baseline comparability between treatment groups achieved by randomization.
There have been studies where outcome was unrelated to treatment but was related to adherence. That is, success was determined not by the treatment the subject was given, but by how well the subject adhered to instructions, whatever then were. In many cases, potentially nonadherent subjects may be more likely to quit a particular treatment. For example, a nonadherent subject might be more likely to quit when assigned to strenuous exercise than to stretching exercises. In a per protocol or on treatment analysis, the balance in adherence achieved at baseline will be lost and the resulting bias might make one of two equivalent treatments appear to be better than it truly is simply because one group of subject, on the whole, are more adherent.
In the spirit of Paul Meier's example, consider a study in which severely ill subjects are randomly assigned to surgery or drug therapy. There will be early deaths in both groups. It would be tempting to exclude the early deaths of those in the surgery group who died before getting the surgery on the grounds that they never got the surgery. However, this has the effect of making the drug therapy group much less healthy on average at baseline.
Reflecting performance in the population
Intention-to-treat analysis is said to be more realistic because it reflects what might be observed in actual clinical practice. In practice, patients may not adhere, they may change treatments, they may die accidentally. ITT factors this into its analysis. It answers the public health question of what happens when a recommendation is made to the general public and the public decides how to implement it. The results of an intention-to-treat analysis can be quite different from the treatment effect observed when adherence is perfect.
When Richard Peto first introduced Intention-To-Treat analysis, the cause was taken up by many prominent statisticians. Others thought that Peto's suggestion was a sophisticated joke and awaited the followup article, which never came, to reveal the prank. I have always had strong reservations about ITT. The first version of this note was hostile to ITT. Over the years, it has, in some ways, become more accepting, but now I find myself even more hostile than I was initially.
Lest there be any uncertainty about my position, I now state formally what I stated informally at the start of this note
Intention-To-Treat Analysis is a
FRAUD!
Intention-To-Treat is a fraud because it is smoke-and-mirrors. It is an illusion. It is too often used without any real understanding, just as Chu-chih's assistant (mis)used his master's one-finger Zen. There is often no thought about the underlying research question and the proper way to answer it. The phrase intention-to-treat is often invoked as a magic incantation that will somehow automatically make any issues surrounding adherence, dropouts, and missing values vanish. (One can hear the Great Oz: "There are no problems here! An Intention-To-Treat analysis was performed!")
It is easy to imagine circumstances where researchers might argue that the actual research question demands what would be called an intention- to-treat analysis, not because it is an ITT analysis, but because the research demands it. Whenever I evaluate a study, I don't care one bit what the investigators call the analysis. I invariably examine the study and its goals to assure myself that the particular analysis is appropriate in the context of two sets of questions.
Intention-to-treat analyses answer a certain kind of research question. On treatment (or per protocol) analyses answer a different kind of research question. My own approach is to ignore labels, ask myself "What is the research question?", and perform the proper analysis whatever it's called. Sometimes I perform both an intention-to-treat analysis and an on treatment analysis, using the results from the different analyses to answer different research questions.
I wear two hats.
I find that my attitude toward ITT analyses depends on the hat that I'm wearing because the types of research questions I see differ according to my role.
Let's consider once again the 4 lines of justification for ITT.
This strikes me as a weak argument, unless the reason the outcome is odd is related to treatment. With Meier's example, the decision to exclude could be made by someone blind to treatment. ITT resolves the issue by including everyone and everything so that any noise would affect all treatments equally. Blinded decisions can often do much the same thing. The major difference is that ITT includes the noise while blinded assessment tries to exclude it.
This is true. However, with nutrition studies, almost all missing data are noninformative, that is, unrelated to the outcome. My colleagues study treatments that are easily tolerated (eat this, drink that, take this pill). In addition, our volunteers are extremely health and diet conscious. When there are missing data or drop outs, it is invariably because subjects move or change jobs.
The only effect of an intention-to-treat analysis in these cases is to add noise to the data.
In many cases, baseline comparability can be preserved by using statistical adjustments. However, this presumes we know what to adjust for, which is not always the case.
Sometimes, what appears to be a problem with maintaining baseline comparability is something quite different. Consider the example involving the deaths of subjects randomized to surgery before the surgery occurs. The real issue is recognizing that the treatment is not only what happens during and after the surgical procedure. It includes what happens during the time spent waiting for the procedure to take place!
There is some truth to this claim. However, it is more complicated than it first appears. There are two components to how a treatment will behave in the population at large: efficacy and adherence. These are separate issues that cannot always be addressed routinely by a single intention-to-treat analysis. A treatment's efficacy is often of great scientific importance (all exaggeration aside) regardless of adherence issues. Adherence during a trial might be quite different from adherence once a treatment has been proven effective. In such cases, analyses that are influenced by adherence in the manner of ITT will not reflect what will happen in practice.
There are some circumstances that demand an intention-to-treat analysis. If the question is, "What happens once a treatment is started or recommended?" subjects must be followed once a treatment is started or recommended, regardless of what else happens. This is typical of the studies I see as an Scientific Review Committee member. Most involve comparing two medical treatments. The research question is invariably whether subjects starting on one treatment fare better than subjects starting on another treatment. I invariably insist that an ITT analysis be performed because a health care provider needs to know what happens when subjects are prescribed (started on) a particular treatment. ITT is the appropriate form of analysis because it is dictated by the research question!*
There may be cases where an intention-to-treat analysis will truly reflect the way the treatments will behave in practice because adherence during the trial will reflect adherence after the treatment is proven effective. I have been told that this is true in the field of mental health. I wonder, though, whether this is the exception rather than the rule.
David Salsburg once asked what to do about an intention-to-treat analysis if at the end of a trial it was learned that everyone assigned treatment A was given treatment B and vice-versa. I got to live his joke. In a placebo-controlled vitamin E study, the packager delivered the pills just as the trial was scheduled to start. Treatments were given to the first few dozen subjects. As part of the protocol, random samples of the packaged pills were analyzed to insure the vitamin E did not lose potency during packaging. We discovered the pills were mislabeled--E as placebo and placebo as E. Since this was discovered a few weeks into the trial, no one had received refills, so there was no possibility of having received something different from what was originally dispensed. We relabeled existing stores properly and I switched the assignment codes for those who had already been given pills to reflect what they actually received. How shall I handle the intention-to-treat analysis?
This slip-up aside, this is an interesting study because it argues both for and against an ITT type of analysis. Because the study pill is administered along by a nurse along with a subject's medications, it is hard to imagine how adherence might change, even if the results of the trial were overwhelmingly positive. This makes an ITT analysis attractive. However, it is likely that there will be many drop outs unrelated to treatment in any study of a frail population. Should they be allowed to water down any treatment effect?
Some subjects will leave the study because they cannot tolerate taking the pill, irrespective of whether it is active or inactive, or because their physicians decide, after enrollment, that they should not be in a study in which they might receive a vitamin E supplement. If the study had resulted in a recommendation that supplements be given, such subjects would not be able to follow it, so perhaps it is inappropriate to use their data to evaluate vitamin E's efficacy.
The bottom line is that an ITT type of analysis may be appropriate in some cases, but it's not a magic charm. A good analyst does ITT not to do ITT but because the analysis demanded by the research question just happens to be ITT. The analysis would be performed whether or not there were something called ITT. The one good thing about ITT is that it forced some people to think about the issues behind the recommendation, but I believe the good is more than offset by thoughtless use.
ITT is typically used as an umbrella to cover two distinct issues--adherence and missing data. They are distinct because even subjects who drop out may return for final measurements, while even adherent subjects may be missing data.
Common sense says, "The data are missing ! How can they possibly be filled in?!" Common sense is right! They can't!
Many imputation methods have been suggested by some very bright people, but there's nothing much that can be done without making lots of critical unverifiable assumptions.
Consider a longitudinal study that some subjects fail to complete. To keep things simple, think of a weight loss study. ITT says that the investigator should do everything possible to persuade the subjects to return for their final weighing. This is in keeping with the "how people will follow the recommendation" function of ITT analyses. In this case, ITT would be correct, not because it's ITT, but rather because it is in keeping with the goals of the study.
What about missing data? Some subjects will have dropped out and refused to have their final measurements taken, while the investigator may have lost contact with others. ITT mandates that we "fill in the blanks". The final values should be replaced with a "best guess". But, how ?!
All of the approaches are merely different ways of forecasting what the final measurement might have been, and analyzing the data with those imputed values. Those who believe in imputation say that it is often good practice to try more than one way to "fill in the blanks" to see whether conclusions change with the different methods.
I find imputation, like ITT, to be smoke and mirrors. It is easy to show that any imputation technique may hurt rather than help depending on the circumstances, by constructing examples like the diet study. No matter how fancy the acronym or how elegant/confusing the mathematics, the bottom line remains the same: Subjects dropped out ! Data are missing ! There is no reason for any of these approaches to work other than an assumption that they will, which is no reason at all.
In summary, the Intention To Treat approach is a tool. In some circumstances, it may be the right tool, but a slavish devotion to ITT is as bad as a slavish devotion to any other approach or method. One size does not fit all. The proper approach, as I've explained from the first version of this note, is to ignore labels, understand the research question, and perform the proper analysis whatever it's called!
There remains one question begging to be asked, so I'll ask it: If missing data and the elimination of nonadherent subjects biases a per protocol analysis and ITT is not the panacea some make it out to be, what happens when both approaches are suspect? The answer is simple, if unpleasant: Who knows? Not every problem is amenable to a solution that can be summarized in a catch-phrase. If enough data are missing or enough subjects are lost to followup, the results may be suspect whatever one does. Situations like this can only be handled with great care and attention to detail on a case-by-case basis.
As I've stated in the introduction to these notes, they are not meant to be a textbook. Rather, they describe what I do as a statistician. Since they are not carved in stone, I have the opportunity to change them whenever seems appropriate to reflect my current thinking. So, what do I really think about the way I've been practicing statistics with regard to ITT?
I've got no confessions to make because, as I've stated (too?) repeatedly, my basic approach--ignoring labels, understanding the research question, and performing the proper analysis whatever it's called--has never changed.
I am becoming more dissatisfied with the labels per protocol and Intention-to-Treat as time goes on. I'm not sure I even understand what they mean.
I am convinced that it is time to retire those labels. Instead, the analysis section of every protocol should contain three sections:
-----------
However, there is a further
complication that I will address in a later note. It is still valid to
ask which diet is more effective if adhered to. The problem with a per
protocol analysis that includes only those with perfect adherence is that
it may not reflect what happens with perfect adherence! That is,
adherence may be determined by the outcome rather than the other way
'round. Those who stopped following the ineffective diet would not have
performed as well as those who stuck it out. However, while there may be
a problem with the per protocol analysis, it is not one that can
be solved by ITT!* This is also why an ITT type of
analysis is appropriate for analyzing diet studies: "What happens when
subjects start a particular diet."