Article Text

Download PDFPDF

Some useful concepts and terms used in articles about treatment
  1. Peter Szatmari, MD
  1. Editor, Evidence-Based Mental Health

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

One of the important principles of practising evidence-based mental health is that the results of research studies should be used to influence clinical decisions about a particular patient. The best quality evidence for making decisions about treatment comes from randomised controlled trials or from overviews of several randomised controlled trials such as a meta-analysis. The reason that a randomised controlled trial provides the best evidence is that, in most circumstances, randomisation avoids any systematic tendency to produce an unequal distribution of prognostic factors between the experimental and control treatments that could influence the outcome. It is important to remember that not all methods of allocation which are described as random are truly random, and even with true randomisation there may still be important differences at baseline between the groups due to small sample sizes. It is also important that those who are assessing outcome are blind to whether the patient received the experimental or control treatments. If there is a statistically significant difference in the rate of a favourable outcome or in change scores from baseline in the experimental group compared with the control group, then it is concluded that the treatment is “effective”.

Evidence-Based Mental Health will only abstract treatment studies if the method of allocation is random, if there was adequate follow up of subjects entered into the trial, and if clinically important outcomes were reported. Unfortunately, there may not be a randomised controlled trial for each clinical question. If that is the case, then clinical decisions must be made on the basis of the best available evidence taking all relevant factors into account. Frequent replication of the intervention using different samples and outcome tools can add to the weight of the evidence in non-experimental designs.

Statistical significance versus clinical importance

Given evidence from a randomised controlled trial, statistical significance is not the only criterion for deciding whether to apply the results of a study. Statistical significance depends on the size of the difference between the groups, the amount of variation in outcome within the groups, and on the number of patients. Clinically trivial differences can be statistically significant if the sample size is sufficiently large. Conversely, clinically important differences can be statistically non-significant if the sample size is too small—that is, if the study lacks power. Clinicians need to evaluate statistical significance and clinical importance in interpreting the results of randomised controlled trials and meta-analyses.

Measures of clinical importance

How does one measure clinical importance? The usual estimate of clinical importance is the effect size; the size of the difference between the experimental and control groups. Whether the outcome is measured in a categorical way (eg, the prevention or treatment of “disorders” or the appearance of specific side effects) or in a continuous way (eg, mean symptom scores), the effect size reflects the difference between the experimental and control groups. Effect sizes tend to be smaller in randomised controlled trials than in non-experimental designs and smaller when there is adequate blinding or concealment of the intervention from the assessors of outcome.1

A common way of expressing effect size for categorical data is the relative risk (RR) or relative benefit (RB—depending on whether one is assessing a negative or positive outcome). The study by Kendall et al in this issue of Evidence-Based Mental Health (p 43) provides a good illustration of these points. Kendall et al report the results of a randomised controlled trial comparing cognitive behaviour therapy (CBT) with a waiting list control for children with anxiety disorders. 60 children were randomised to CBT and 34 to a waiting list control. After 8 weeks of treatment, 53% (32 of 60) of the children receiving CBT no longer met diagnostic criteria for their primary anxiety disorder compared with 6% (2 of 34) in the control group (p<0.001). This difference is certainly statistically significant, but the p value tells us nothing about its clinical importance. One measure of clinical importance is the RB; that is, the probability of being free of anxiety disorder after 8 weeks of CBT compared with the probability of being free of anxiety disorder in the control group. Using data from the article, we can calculate that the RB is 9.1 or 32/60 ÷ 2/34 = 9.1. In other words, anxious children receiving CBT are 9.1 times more likely to be free of anxiety disorder than children on the waiting list after 8 weeks. An alternative but similar statistic is to calculate the relative benefit increase (RBI) which is the proportional increase in rates of a good outcome between the experimental and control patients in the trial. It is calculated as the experimental group event rate (EER) minus the control group event rate (CER) divided by the CER or (EER − CER) ÷ CER. In this case, the RBI is (32/60 − 2/34) ÷ 2/34 or 8.07. In other words, there is roughly an 8 fold increase (800%) in rates of being free of anxiety disorder in the experimental compared with the control group. Attentive readers will notice that RBI = RB − 1 (the difference due to rounding), a relation that always holds.

This is a legitimate and popular way of reporting effect sizes but it has one serious limitation; it ignores the base rates in a study which could have a profound influence on the clinical application. Consider a situation in which the rate of improvement in CBT was 9% compared with 1% in the control group. With a large enough sample size, this difference could be statistically significant. The RB still equals 9 and the RBI is still roughly 8 or 800%. However, most would agree that the magnitude of the difference between the experimental and control groups is quite trivial, particularly if the treatment was expensive, difficult to deliver, or required considerable training (as CBT does, for example). In view of these limitations, it has been argued that RB and the similar RBI are not user friendly and do not provide the most clinically important information.

An alternative to these statistics that does take account of base rates is to consider absolute benefit increase (ABI) and, from this, the number needed to treat (NNT). The ABI is the absolute arithmetic difference in rates of good outcomes between the experimental and control patients and refers to the number of patients who benefit per 100 treated. It is simply calculated as rate of a good outcome in the experimental group minus the rate in the control group (in the study by Kendall et al, the ABI is 53% − 6% = 47%). Going one step further, the reciprocal of the ABI is the NNT—that is, the number of patients who need to be treated to achieve 1 additional good outcome. It is calculated as 1/ABI and in the study by Kendall et al the NNT is 1/.47 = 2.13 which is rounded up to 3. In other words, 3 children need to be treated with CBT to achieve 1 additional good outcome over having a patient on a waiting list.

In Evidence-Based Mental Health (as in Evidence-Based Medicine), we have preferred to use terms such as ABI and NNT to capture the essence of clinically important differences.

What is a clinically important NNT?

The answer to this depends on the burden of suffering of the disorder as measured by prevalence, morbidity, and outcome; the economics and difficulty of the treatment procedure; and, finally, the cost of not treating the disorder. It is useful to compare the NNT in the study by Kendall et al with values obtained in other areas in medicine and in mental health. For example, in a meta-analysis by Hotopf et al,2 42 patients need to be treated with a serotonin specific reuptake inhibitor to prevent 1 additional discontinuation of treatment with a tricyclic antidepressant presumably due to side effects. Based on the data from Essali et al,3 37 patients need to be treated with clozapine to prevent 1 additional relapse on a typical neuroleptic, however only 6 patients had to be treated with clozapine to have 1 additional patient experience a “clinically important improvement”. Thus the NNT in the study by Kendall et al is really quite impressive and if replicated means that an effective form of psychotherapy is now available for children with anxiety disorders.

So far we have just considered ways of expressing effect size using categorical data. Most treatment studies in mental health report changes in symptoms over time and between patient groups. With continuous data, the issue is more complicated but it is still possible to convert continuous measures into NNT. (More about this in a forthcoming issue of the glossary.)

Uncertainty and confidence intervals

One final point needs to be made. The statistics outlined above to estimate effect sizes are just that; they are estimates derived from a particular sample. The true value may or may not be exactly the same as the estimated value. There is a degree of uncertainty associated with these estimates and we can quantify that degree of uncertainty using confidence intervals. Altman provides a useful definition of confidence intervals as “the range of values within which we can be 95% sure that the population value lies”.4 In the example used above from the study by Kendall et al we can be 95% certain that the true NNT is between 2 and 4 to produce one more child free of anxiety disorder using CBT. In the study by Hotopf et al the 95% CI is between 24 and 148.2 Because the degree of uncertainty is such an important variable in comparing results from different studies, we will also provide CIs around estimates of ABI and NNT even if these are not provided in the article itself.

We hope that these tools derived from clinical epidemiology will be helpful to clinicians in translating the results of treatment interventions into clinical practice. Future issues of the notebook will explain terms used in prognosis studies and studies of causation and cost effectiveness among others. We welcome the feedback of our readers on these and other topics.