Article Text

Download PDFPDF

Cluster randomisation trials in mental health research
  1. Allan Donner, PhD
  1. University of Western Ontario, London, Ontario, Canada

Statistics from

The unit of allocation in most clinical trials is the individual patient. However, experimental trials in which the unit of allocation is an intact cluster of participants (eg, families, schools, medical practices, communities) are becoming increasingly widespread in the evaluation of healthcare and educational interventions. For example, Avorn et al describe the results of a health education trial (aimed at staff personnel in nursing homes) that was designed to reduce the use of psychoactive drugs by residents.1 Six pair matched facilities were included in this trial, with 1 facility in each matched pair randomly assigned to the educational programme and the other facility serving as a control.

Several reasons exist for favouring cluster randomisation in this trial. A principal one would be to avoid the experimental contamination which could occur when the same personnel are asked to give both interventions to different participants and when knowledge of the intervention may influence the responses of participants in the control group. A second reason is that the assignment of a new educational programme to some individuals within a nursing home but not to others might be regarded as unacceptable, or even unethical, by some practitioners. Finally, having administratively set up such a programme within a facility, it would seem much more likely to function effectively from a practical perspective if all staff members, and not just some, were involved.

A notable design feature of this trial is that the 12 nursing homes recruited were pair matched on the basis of size, type of ownership, and level of drug use. The purpose of such matching was to ensure that the facilities in each pair were similar with respect to baseline drug use, but geographically distant enough to minimise the risk of experimental contamination that could arise through the sharing of knowledge. Such matching or stratification by selected baseline risk factors is a common feature of cluster randomisation trials, particularly when the total number of clusters to be randomised is small.

A second example of a recent cluster randomisation trial is given by Kinmonth et al who assessed the effect of additional training in a patient centred approach as directed to practice nurses and general practitioners.2 The main outcome measures for this study included the quality of life and psychological wellbeing of patients with newly diagnosed type 2 diabetes. The trial randomised 41 practices (21 to the intervention group, 20 to the control group) in a health region in southern England, with the number of patients per practice ranging from 250–360. Because the intervention was aimed at and delivered by healthcare professionals, the most natural unit of randomisation was at the practice level. None the less, the effectiveness of the intervention was measured by recording outcomes at the level of individual patient. This disparity between the unit of randomisation and the unit of analysis is a characteristic feature of cluster randomisation trials.

Other published examples exist in which randomisation at the cluster level is the most natural choice, or even a clear necessity. This was arguably the case in the HIV prevention trial described by Grosskuth et al.3 As the authors note, randomisation was necessary at the community level in this trial because the intervention involved the provision of improved services at designated health facilities, with these services available to the entire population served by each facility.

Many of these reasons would seem to apply in a natural way to trials evaluating diagnostic or therapeutic interventions that are directed to psychiatrists and other mental health professionals. In every instance, however, the rationale for adopting cluster randomisation inevitably must rest on very practical considerations. This is because cluster randomisation designs tend to be less efficient, in a statistical sense, than designs that randomise individuals to intervention groups. The loss of efficiency arises because the responses of individuals in the same cluster tend to be more similar than the responses of individuals in different clusters.

Study design issues in cluster randomisation

The degree of similarity among responses within a cluster is typically measured by a value known as the intracluster (intraclass) correlation coefficient, which usually must be estimated from the sample data. Denoted by ρ, this value may be interpreted as the standard Pearson correlation coefficient between any two responses in the same cluster. If ρ=0, participants are completely independent of each other, while if ρ=1 (perfect correlation), the information in a cluster is totally summarised by the response on a single cluster member. In most cluster randomisation trials, however, the value of ρ is small and positive, which is equivalent to stating that the variation among observations in different clusters exceeds the variation within clusters. Under these conditions, it is sometimes stated that the design is characterised by “between cluster variation.” The underlying reasons for between cluster variation will differ from trial to trial, but in practice include the following:

  1. Participant selection, where individuals are in a position to choose the cluster to which they belong. For example, in a trial randomising physician practices, the characteristics of patients belonging to a practice could be related to age or sex differences among practitioners. To the extent that these characteristics are also related to patient response, a clustering effect will be induced within practices. In addition, as noted by Rhee et al, the outcomes on 2 or more patients treated by the same physician could share the influence of that physician's style of practice.4

  2. The influence of covariates at the cluster level, where all individuals in a cluster are affected in a similar manner as a result of sharing exposure to a common environment. As discussed by Rice and Leyland, patients attending the same hospital may share several common influences, including, for example, the same pressure to shorten length of stay.5 In community based studies, differences in bylaws between municipalities could influence the success of smoking cessation programmes. In other studies, where intact families or households are randomised, the combined effect of both environmental and genetic factors may contribute to the observed between cluster variation.

  3. The effect of personal interactions among cluster members who receive the same intervention. For example, treatments or educational interventions provided in a group setting could lead to a sharing of information among group members that creates a clustering effect. More generally, as noted by Koepsell, just as infectious agents can be spread from person to person, the transmission of attitudes, norms, and behaviours among people who are in regular contact can result in similar responses.6

Without extensive empirical data, it is usually impossible to distinguish among the potential reasons for between cluster variation. Regardless of the specific cause, however, such variation invariably leads to a reduction in the effective sample size for the trial, where the size of the reduction increases with both the magnitude of ρ and the average cluster size. This in turn leads to a loss of precision in estimating the effect of intervention.

These effects of clustering can be easily expressed quantitatively. Consider an experimental trial in which k clusters, each consisting of m individuals, are randomly assigned to either an experimental or control group. We suppose that the primary aim of the trial is to compare the groups with respect to their mean values on a normally distributed response variable Y having a common but unknown variance σ2. Estimates of the population means μ1 and μ2are given by the usual sample means Ȳ1 and Ȳ2 for the experimental and control groups respectively. As shown by Donner et al,7 the variance of each of these means is given byEmbedded Imagewhere ρ is the intracluster correlation coefficient. If σ2 is replaced by P(1–P) where P denotes the probability of a success, equation (1) also provides an expression for the variance of a sample proportion under clustering.

For sample size determination, equation (1) implies that the usual estimate of the required number of individuals in each group should be multiplied by the variance inflation factor (or design effect) IF = 1+(m – 1) ρ to provide the same statistical power as would be obtained by randomising km individuals to each group when there is no clustering effect. It should also be noted that small values of ρ accompanied by large values of m can considerably inflate the required sample size for a trial. For example, if ρ=0.01 and medical practices of average size 1000 are randomised to each of 2 intervention groups, then the total number of participants required under cluster randomisation is more than 10 times that required under individual randomisation.

Challenges of applying cluster randomisation

Many of the methodological challenges of cluster randomisation arise because inferences are usually intended to apply at the level of the individual participants, while randomisation is at the cluster level. Application of standard statistical methods to the analysis, which invariably assume no between cluster variation, will tend to bias observed p values downward, thus risking a spurious claim of statistical significance and producing an artificially precise estimate of the intervention effect. The extent of this bias for clusters of fixed size m is proportional to the magnitude of the inflation factor IF which is analogous to the bias associated with the application of standard sample size formulas. This problem has led to the famous quote in the epidemiological literature by Cornfield that “randomisation by cluster accompanied by an analysis appropriate by individual is an exercise in self deception and should be discouraged.”8 As stated in other words by Wood and Freemantle,9 “fitting a model that ignores (between cluster variability) is akin to expecting a free lunch—and there is no such thing as a free lunch.” Methods of analysis for binary, quantitative, and time to event outcomes are discussed in detail by Donner and Klar.10

There are also ethical challenges unique to cluster randomisation trials to which attention has been given only recently.11, 12 For example, it is often necessary to distinguish 2 distinct levels of informed consent in such trials: (1) informed consent for randomisation (usually provided by a “decision maker,” such as a physician or clinic director, and (2) informed consent for the individual participants, given that randomisation has occurred. By analogy to current ethical requirements for clinical trials, it would be unethical not to obtain informed consent from every cluster member before random assignment. It appears to be an unresolved question as to whether such a strict analogy is required for cluster randomisation trials, particularly when relatively large clusters, such as medical practices or entire communities, are randomised. This issue is likely to be the subject of considerable debate over the coming years as cluster randomisation designs become more widely used in the health research community.


View Abstract

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.