Article Text
Statistics from Altmetric.com
Introduction
Substantial clinical differences between the studies included in a metaanalysis may compromise the overall applicability of the summary estimate.1 Several approaches have been suggested to account for heterogeneity, such as conducting a random effects metaanalysis and computing predictive intervals, and include several studylevel characteristics as predictors of the observed effect size.2–4 If individual participant data are available, the investigation of associations between outcome and patientlevel characteristics is straightforward by applying regression techniques. However, at the standard pairwise metaanalysis level, usually only triallevel results are given. This does not always allow researchers to adequately explore the association between predictors and effect sizes, as it is possible that population characteristics do not vary consistently within and/or across trials.
One potentially important source of heterogeneity in a metaanalysis is the baseline severity of study participants with respect to the disease under investigation.5 This is evident in mental health trials, for instance, antidepressants appeared to be more effective when administered to severely depressed populations rather than to patients with mild depression.6–12
The possible association between baseline severity and relative effects can be explored via metaregression, a tool used in metaanalysis to explore the impact of moderator variables or predictors on the study effect size.13 This is straightforward when the required data are available (eg, by using the average score of the participants on a rating scale at the beginning of each study as a predictor for the relative effects).
However, particularly for dichotomous outcomes, we usually do not have this type of information and we need to express baseline severity using surrogate predictors. In practice, the underlying or control group risk (defined as the probability for a success in the control arm) is often used as a proxy for a collection of unavailable population and setting characteristics, which might affect the relative treatment effect.14 ,15 Several metaregression models have been presented that control the estimated summary effect for the observed variability in underlying risk.14 ,16–19 These models differ with respect to the assumption they make on how underlying risks are related across studies. A common problem when investigating the association between the underlying risk and the effect size is that the former is involved in the estimation of the latter and therefore they are inherently interrelated. This problem is also known as the ‘regression towards the mean’ phenomenon.5 ,20 In addition, the relationship between underlying risk and relative effects may be dependent on the followup of the trials or the effect size we use in the metaanalysis.3 Thus, the Cochrane Handbook does not recommend the use of effect sizes that assume a strong association between the outcome and underlying risk.3
In this paper, we offer a brief overview of the available metaregression models and the possible assumptions that can be employed to account for differences in baseline characteristics between the studies of a metaanalysis, and we use two examples from the mental health literature to present and discuss the findings from alternative models.
Methods
We employed metaregression models to investigate how changes in baseline severity or the underlying risk impact on the relative effect, using aggregated data for the trial effect as well as for the predictor (eg, baseline severity).
Accounting for differences in baseline severity
Mental health trials very often evaluate the effectiveness of drugs using standardised rating scales that measure the severity of symptoms with respect to the clinical condition under consideration. It might be the case that the extent of improvement of symptoms (eg, reduction in the mean endpoint score) is partly dependent on the initial (baseline) score of the participants. If this dependence is not consistent in direction and magnitude between control and experimental group, it would affect the relative effectiveness of the treatments.
Systematic reviews should set narrow enough inclusion criteria for study populations to preserve important discrepancies in baseline severity across the eligible studies, when this characteristic is a potential ‘effect modifier’.3 However, researchers sometimes decide to include a wider population in their review to expand the applicability (external validity) of their findings as well as to explore factors that may impact the results.
In that case the average value of the participants’ baseline scores from each study may be used as a predictor in a metaregression model with the relative effect (eg, mean difference) as the dependent variable. The outcome of such a model would be a constant and a slope (regression coefficient) parameter. The regression coefficient would give information on how much the relative effect is on average increased or decreased for a specific change in the mean baseline score of a study. The constant would represent the estimated summary effect when the explanatory variable equals zero; hence this estimate would correspond to studies with zero average baseline score. However, observing zero baseline severity is not a realistic scenario and we should not make inferences for such extreme values. Thus, it is common to subtract an observed value (usually the mean, minimum or maximum score across studies) from the average score of each trial and use these differences as explanatory variable. Then, the constant of the metaregression model would represent the summary effect that corresponds to studies with that specific observed value (eg, the mean baseline severity from all studies).
Accounting for differences in underlying risk
In the absence of information on the baseline severity of participants, researchers may consider using the underlying risk as a proxy. This characteristic is more likely to be reported in trials with dichotomous outcomes; for example, mental health trials often measure the number of responders and nonresponders in each study arm by defining a minimum score reduction in a rating scale as the threshold for response (ie, more than 50% reduction between baseline and end point). In this case the explanatory variable representing underlying risk in the metaregression model would be the number of responders over the total number of randomised participants in the control arm of each study.
An advantage of using underlying risk as explanatory variable is that it may reflect several population characteristics on the top of baseline severity for which information might not be available. Nevertheless, an important limitation of this approach is the inherent mathematical relationship between underlying risk and relative effects (the regression towards the mean), which may lead to falsepositive significant associations.15 ,19 ,21 To overcome this problem, transformations of the study's underlying risks have been suggested. A probably better approach is the use of a statistical (hierarchical) model in which the true rather than the observed underlying risk is associated to the true relative effect.18
Several different assumptions have been suggested in the literature to model the relationship of underlying risks between the different studies:14 ,17–19 ,22 ,23

The underlying risks across the different studies of a metaanalysis are unrelated;14

The studyspecific underlying risks share a common normal distribution.18 ,19 ,22 This entails the contention that underlying risks are different yet related in the sense that they have a mean value, which is the most probable value we may observe in a trial, and the probability of observing a larger or smaller value thins out symmetrically as we move away from the mean value;

The underlying risks come from two different normal distributions.17 Under this assumption we allow the underlying risks to be much different than when using the previous scenario (ie, a single common distribution for all studies), although again we impose some level of dependence among them.
An important advantage of the first modelling assumption is that it can be easily fitted in any software. In addition, this approach possibly better ‘preserves the benefits of randomisation’,24 whereas the dependence of underlying risks between different studies (approaches 2 and 3) is a strong and potentially unrealistic assumption.16 On the other hand, assuming a common distribution for the underlying risks can occasionally lead to more precise and less biased estimates of the study's underlying risks.16
Other approaches have been suggested (including the use of asymmetrical distributions or distributions with heavier tails across the underlying risks16 ,23), however, the appropriateness of each assumption is a matter of debate. Most importantly, the clinical insight on the comparability of underlying risk across trials is crucial to adequately inform the choice of a reasonable model.
Results
We applied the metaregression models discussed in this paper on two datasets of antidepressant trials published in the scientific literature by Kirsch et al10 and Undurraga and Baldessarini.6 The outcome in both metaanalyses was improvement of symptoms for depression measured as a continuous outcome. According to the original publications, in all analyses we assumed that the different antidepressants were equivalent.
The first example consists of 35 trials comparing four active drugs (fluoxetine, venlafaxine, nefazodone and paroxetine) against placebo with available data on the initial (baseline) score of the participants and the change score from baseline at the end of the followup. Using random effects metaanalysis the estimated summary standardised mean difference (SMD) was 0.32 (95% CI: 0.24 to 0.41), which suggested a beneficial effect of antidepressants over placebo. The presence of lowtomoderate heterogeneity was implied by the I^{2} (43% with 95% CI: 15% to 62%) and the heterogeneity SD (τ) was 0.16.
The second data set originally involved 124 trials providing information on the number of responders (as dichotomous outcome) in each study arm. For simplicity, in our example we restricted the data to studies that evaluated the relative effectiveness of the same treatments considered by Kirsch and colleagues. Therefore, we included 38 trials that compared three active drugs (fluoxetine, venlafaxine and paroxetine) with placebo. The summary OR estimated from this subset of studies (under the randomeffects model) was 2.05 (95% CI: 1.81 to 2.32), in favour of antidepressants over placebo. The estimated heterogeneity was τ0.22 with an I^{2} equal to 36% (95% CI: 4% to 57%).
The impact of baseline severity
We explored the possible relationship between the (observed) initial severity score of the participants and the relative effectiveness of antidepressants in the Kirsch data by performing metaregression. To obtain a meaningful interpretation of the estimated constant value, we used the difference between each study's average baseline score and the mean (baseline) score from all studies (mean=25.54) as predictor. This model gave a statistically significant coefficient for the effect of baseline severity (table 1) suggesting that if two studies have a 10unit difference in participant's baseline score, then the SMD for the study with a more severe population would be on average 0.5 units larger; it is interesting that this change equals the criterion for clinical significance.25 Also, accounting for differences in baseline severity across studies explained 25% of the estimated heterogeneity.
Figure 1 shows that this apparent strong association might be largely influenced by one trial26 in which participants had a considerably smaller baseline score compared to the other studies. However, after excluding the Dunlop study and again running the metaregression model the results did not materially change (table 1).
The impact of underlying risk
To account for differences in underlying risk across trials in the Undurraga dataset, we first performed the metaregression analysis using the observed risk for response in placebo arm as predictor (after subtracting the average risk from all studies). This analysis yielded a marginally statistically nonsignificant coefficient, which implied that a 10% increase in the underlying risk of the participants reduces the logarithm of the OR (lnOR) on average by 0.12 (table 2). In addition, the variability in underlying risk across studies explained 27% of the heterogeneity (τ) that was estimated from the standard random effects metaanalysis (without any predictor).
Similarly to the previous example, a graphical depiction of the studies (figure 2) shows that the trial by Cohn and Wilcox27 is a potential outlier observation. After excluding the Cohn study from the metaregression model the results were not modified substantially in terms of the coefficient but there was an important decrease of the heterogeneity (table 2).
To illustrate how the different assumptions for underlying risk across trials may affect the results, we additionally fit the metaregression model (without the Cohn study) using the true underlying risk (in a Bayesian environment) as a predictor and assuming both independent and dependent (ie, from a normal distribution) studyspecific underlying risks. The results from these approaches are given in table 3. More specifically, when we assumed no relationship between the underlying risks from different studies, the estimated coefficient was smaller in magnitude compared to that derived from the initial metaregression (which used the observed values as predictor), but marginally statistically significant. The assumption that all underlying risks share a common normal distribution yielded a (nonsignificant) coefficient close to zero, implying that this characteristic does not seem to affect the relative effect of antidepressants when compared with placebo. The same findings (lack of association) were obtained when we used the ‘weaker’ assumption that the trial's underlying risks come from two different normal distributions (with common variance; table 3).
Discussion
Ignoring important variability in one or more population characteristics across studies in a metaanalysis may lead to misleading conclusions. When information on such characteristics is available, metaregression or subgroup analysis should be employed to assess their impact on the results. Baseline severity has been widely considered as an important effect modifier in metaanalyses of mental health trials. In the Kirsch data, we found an important linear association between baseline severity and relative effects, while the differences in the initial severity of participants across trials explained the amount of heterogeneity. Metaregression models can also be used to disentangle the effect of disease severity and placebo response on the outcome.
In the absence of available data for important baseline characteristics, underlying risk has been suggested as a surrogate variable for baseline severity, and a variety of additional and possibly unmeasured or unknown factors that may act as effect modifiers. An empirical study that investigated the relationship between underlying risk and treatment effects in 115 metaanalyses with dichotomous outcomes found statistically significant associations in 14% of these metaanalyses.18 In our example,6 the effect of underlying risk became marginally statistically nonsignificant by excluding one study that seemed to be an outlier. Although this should be verified using appropriate statistical methods for detecting outliers, a graphical depiction is always helpful in identifying observations that possibly do not fit to the rest of the data and might exaggerate associations via metaregression. Also, in this collection of studies the different assumptions for the underlying risks across trials affected the estimation of the regression coefficients. Where no prior belief exists for the plausibility of each assumption the different suggested models may be employed as a sensitivity analysis.
Finally, despite the importance of baseline characteristics in synthesising the studies of a systematic review, it is unclear whether investigators regularly incorporate them in their analyses. Protocols for metaanalyses should carefully consider the targeted population and include metaregression with baseline severity or underlying risk among their additional analyses when baseline differences are expected between the eligible studies.
References
Footnotes

Funding AC received funding from Greek national funds through the Operational Program ‘Education and Lifelong Learning’ of the National Strategic Reference Framework (NSRF)—Research Funding Program: ARISTEIA. Investing in knowledge society through the European Social Fund.

Competing interests None.
Request permissions
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.