Article Text

Download PDFPDF

Accounting for baseline differences in meta-analysis
  1. Anna Chaimani
  1. Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece;

Statistics from


Substantial clinical differences between the studies included in a meta-analysis may compromise the overall applicability of the summary estimate.1 Several approaches have been suggested to account for heterogeneity, such as conducting a random effects meta-analysis and computing predictive intervals, and include several study-level characteristics as predictors of the observed effect size.2–4 If individual participant data are available, the investigation of associations between outcome and patient-level characteristics is straightforward by applying regression techniques. However, at the standard pair-wise meta-analysis level, usually only trial-level results are given. This does not always allow researchers to adequately explore the association between predictors and effect sizes, as it is possible that population characteristics do not vary consistently within and/or across trials.

One potentially important source of heterogeneity in a meta-analysis is the baseline severity of study participants with respect to the disease under investigation.5 This is evident in mental health trials, for instance, antidepressants appeared to be more effective when administered to severely depressed populations rather than to patients with mild depression.6–12

The possible association between baseline severity and relative effects can be explored via meta-regression, a tool used in meta-analysis to explore the impact of moderator variables or predictors on the study effect size.13 This is straightforward when the required data are available (eg, by using the average score of the participants on a rating scale at the beginning of each study as a predictor for the relative effects).

However, particularly for dichotomous outcomes, we usually do not have this type of information and we need to express baseline severity using surrogate predictors. In practice, the underlying or control group risk (defined as the probability for a success in the control arm) is often used as a proxy for a collection of unavailable population and setting characteristics, which might affect the relative treatment effect.14 ,15 Several meta-regression models have been presented that control the estimated summary effect for the observed variability in underlying risk.14 ,16–19 These models differ with respect to the assumption they make on how underlying risks are related across studies. A common problem when investigating the association between the underlying risk and the effect size is that the former is involved in the estimation of the latter and therefore they are inherently inter-related. This problem is also known as the ‘regression towards the mean’ phenomenon.5 ,20 In addition, the relationship between underlying risk and relative effects may be dependent on the follow-up of the trials or the effect size we use in the meta-analysis.3 Thus, the Cochrane Handbook does not recommend the use of effect sizes that assume a strong association between the outcome and underlying risk.3

In this paper, we offer a brief overview of the available meta-regression models and the possible assumptions that can be employed to account for differences in baseline characteristics between the studies of a meta-analysis, and we use two examples from the mental health literature to present and discuss the findings from alternative models.


We employed meta-regression models to investigate how changes in baseline severity or the underlying risk impact on the relative effect, using aggregated data for the trial effect as well as for the predictor (eg, baseline severity).

Accounting for differences in baseline severity

Mental health trials very often evaluate the effectiveness of drugs using standardised rating scales that measure the severity of symptoms with respect to the clinical condition under consideration. It might be the case that the extent of improvement of symptoms (eg, reduction in the mean endpoint score) is partly dependent on the initial (baseline) score of the participants. If this dependence is not consistent in direction and magnitude between control and experimental group, it would affect the relative effectiveness of the treatments.

Systematic reviews should set narrow enough inclusion criteria for study populations to preserve important discrepancies in baseline severity across the eligible studies, when this characteristic is a potential ‘effect modifier’.3 However, researchers sometimes decide to include a wider population in their review to expand the applicability (external validity) of their findings as well as to explore factors that may impact the results.

In that case the average value of the participants’ baseline scores from each study may be used as a predictor in a meta-regression model with the relative effect (eg, mean difference) as the dependent variable. The outcome of such a model would be a constant and a slope (regression coefficient) parameter. The regression coefficient would give information on how much the relative effect is on average increased or decreased for a specific change in the mean baseline score of a study. The constant would represent the estimated summary effect when the explanatory variable equals zero; hence this estimate would correspond to studies with zero average baseline score. However, observing zero baseline severity is not a realistic scenario and we should not make inferences for such extreme values. Thus, it is common to subtract an observed value (usually the mean, minimum or maximum score across studies) from the average score of each trial and use these differences as explanatory variable. Then, the constant of the meta-regression model would represent the summary effect that corresponds to studies with that specific observed value (eg, the mean baseline severity from all studies).

Accounting for differences in underlying risk

In the absence of information on the baseline severity of participants, researchers may consider using the underlying risk as a proxy. This characteristic is more likely to be reported in trials with dichotomous outcomes; for example, mental health trials often measure the number of responders and non-responders in each study arm by defining a minimum score reduction in a rating scale as the threshold for response (ie, more than 50% reduction between baseline and end point). In this case the explanatory variable representing underlying risk in the meta-regression model would be the number of responders over the total number of randomised participants in the control arm of each study.

An advantage of using underlying risk as explanatory variable is that it may reflect several population characteristics on the top of baseline severity for which information might not be available. Nevertheless, an important limitation of this approach is the inherent mathematical relationship between underlying risk and relative effects (the regression towards the mean), which may lead to false-positive significant associations.15 ,19 ,21 To overcome this problem, transformations of the study's underlying risks have been suggested. A probably better approach is the use of a statistical (hierarchical) model in which the true rather than the observed underlying risk is associated to the true relative effect.18

Several different assumptions have been suggested in the literature to model the relationship of underlying risks between the different studies:14 ,17–19 ,22 ,23

  1. The underlying risks across the different studies of a meta-analysis are unrelated;14

  2. The study-specific underlying risks share a common normal distribution.18 ,19 ,22 This entails the contention that underlying risks are different yet related in the sense that they have a mean value, which is the most probable value we may observe in a trial, and the probability of observing a larger or smaller value thins out symmetrically as we move away from the mean value;

  3. The underlying risks come from two different normal distributions.17 Under this assumption we allow the underlying risks to be much different than when using the previous scenario (ie, a single common distribution for all studies), although again we impose some level of dependence among them.

An important advantage of the first modelling assumption is that it can be easily fitted in any software. In addition, this approach possibly better ‘preserves the benefits of randomisation’,24 whereas the dependence of underlying risks between different studies (approaches 2 and 3) is a strong and potentially unrealistic assumption.16 On the other hand, assuming a common distribution for the underlying risks can occasionally lead to more precise and less biased estimates of the study's underlying risks.16

Other approaches have been suggested (including the use of asymmetrical distributions or distributions with heavier tails across the underlying risks16 ,23), however, the appropriateness of each assumption is a matter of debate. Most importantly, the clinical insight on the comparability of underlying risk across trials is crucial to adequately inform the choice of a reasonable model.


We applied the meta-regression models discussed in this paper on two datasets of antidepressant trials published in the scientific literature by Kirsch et al10 and Undurraga and Baldessarini.6 The outcome in both meta-analyses was improvement of symptoms for depression measured as a continuous outcome. According to the original publications, in all analyses we assumed that the different antidepressants were equivalent.

The first example consists of 35 trials comparing four active drugs (fluoxetine, venlafaxine, nefazodone and paroxetine) against placebo with available data on the initial (baseline) score of the participants and the change score from baseline at the end of the follow-up. Using random effects meta-analysis the estimated summary standardised mean difference (SMD) was 0.32 (95% CI: 0.24 to 0.41), which suggested a beneficial effect of antidepressants over placebo. The presence of low-to-moderate heterogeneity was implied by the I2 (43% with 95% CI: 15% to 62%) and the heterogeneity SD (τ) was 0.16.

The second data set originally involved 124 trials providing information on the number of responders (as dichotomous outcome) in each study arm. For simplicity, in our example we restricted the data to studies that evaluated the relative effectiveness of the same treatments considered by Kirsch and colleagues. Therefore, we included 38 trials that compared three active drugs (fluoxetine, venlafaxine and paroxetine) with placebo. The summary OR estimated from this subset of studies (under the random-effects model) was 2.05 (95% CI: 1.81 to 2.32), in favour of antidepressants over placebo. The estimated heterogeneity was τ0.22 with an I2 equal to 36% (95% CI: 4% to 57%).

The impact of baseline severity

We explored the possible relationship between the (observed) initial severity score of the participants and the relative effectiveness of antidepressants in the Kirsch data by performing meta-regression. To obtain a meaningful interpretation of the estimated constant value, we used the difference between each study's average baseline score and the mean (baseline) score from all studies (mean=25.54) as predictor. This model gave a statistically significant coefficient for the effect of baseline severity (table 1) suggesting that if two studies have a 10-unit difference in participant's baseline score, then the SMD for the study with a more severe population would be on average 0.5 units larger; it is interesting that this change equals the criterion for clinical significance.25 Also, accounting for differences in baseline severity across studies explained 25% of the estimated heterogeneity.

Table 1

Results from the meta-regression model that accounted for differences in the average baseline severity score between the studies of the Kirsch dataset with dependent variable as the standardised mean difference (SMD) of antidepressants versus placebo

Figure 1 shows that this apparent strong association might be largely influenced by one trial26 in which participants had a considerably smaller baseline score compared to the other studies. However, after excluding the Dunlop study and again running the meta-regression model the results did not materially change (table 1).

Figure 1

Graphical illustration of the relationship between the average baseline severity score of study participants (ie, the average score from all participants on a continuous symptom scale at the beginning of each study) and the relative effect of antidepressants for the Kirsch data.10 The size of the circles is proportional to the precision (inverse of the variance) of the studies. The line represents the meta-regression line (SMD, standardised mean difference).

The impact of underlying risk

To account for differences in underlying risk across trials in the Undurraga dataset, we first performed the meta-regression analysis using the observed risk for response in placebo arm as predictor (after subtracting the average risk from all studies). This analysis yielded a marginally statistically non-significant coefficient, which implied that a 10% increase in the underlying risk of the participants reduces the logarithm of the OR (lnOR) on average by 0.12 (table 2). In addition, the variability in underlying risk across studies explained 27% of the heterogeneity (τ) that was estimated from the standard random effects meta-analysis (without any predictor).

Table 2

Results from the meta-regression model that accounted for differences in underlying risk between the studies from the Undurraga example with dependent variable as the logarithm of the OR (lnOR) of antidepressants versus placebo

Similarly to the previous example, a graphical depiction of the studies (figure 2) shows that the trial by Cohn and Wilcox27 is a potential outlier observation. After excluding the Cohn study from the meta-regression model the results were not modified substantially in terms of the coefficient but there was an important decrease of the heterogeneity (table 2).

Figure 2

Graphical illustration of the relationship between the underlying risk (ie, the number of successes over the total participants) of the studies and the relative effect of antidepressants for the Undurraga example.6 The size of the circles is proportional to the precision (inverse of the variance) of the studies. The line represents the meta-regression line (ln(OR), logarithm of the OR).

To illustrate how the different assumptions for underlying risk across trials may affect the results, we additionally fit the meta-regression model (without the Cohn study) using the true underlying risk (in a Bayesian environment) as a predictor and assuming both independent and dependent (ie, from a normal distribution) study-specific underlying risks. The results from these approaches are given in table 3. More specifically, when we assumed no relationship between the underlying risks from different studies, the estimated coefficient was smaller in magnitude compared to that derived from the initial meta-regression (which used the observed values as predictor), but marginally statistically significant. The assumption that all underlying risks share a common normal distribution yielded a (non-significant) coefficient close to zero, implying that this characteristic does not seem to affect the relative effect of antidepressants when compared with placebo. The same findings (lack of association) were obtained when we used the ‘weaker’ assumption that the trial's underlying risks come from two different normal distributions (with common variance; table 3).

Table 3

Results from the three meta-regression models that used the true study underlying risks from the Undurraga example with dependent variable as the logarithm of the OR (lnOR) of antidepressants versus placebo


Ignoring important variability in one or more population characteristics across studies in a meta-analysis may lead to misleading conclusions. When information on such characteristics is available, meta-regression or subgroup analysis should be employed to assess their impact on the results. Baseline severity has been widely considered as an important effect modifier in meta-analyses of mental health trials. In the Kirsch data, we found an important linear association between baseline severity and relative effects, while the differences in the initial severity of participants across trials explained the amount of heterogeneity. Meta-regression models can also be used to disentangle the effect of disease severity and placebo response on the outcome.

In the absence of available data for important baseline characteristics, underlying risk has been suggested as a surrogate variable for baseline severity, and a variety of additional and possibly unmeasured or unknown factors that may act as effect modifiers. An empirical study that investigated the relationship between underlying risk and treatment effects in 115 meta-analyses with dichotomous outcomes found statistically significant associations in 14% of these meta-analyses.18 In our example,6 the effect of underlying risk became marginally statistically non-significant by excluding one study that seemed to be an outlier. Although this should be verified using appropriate statistical methods for detecting outliers, a graphical depiction is always helpful in identifying observations that possibly do not fit to the rest of the data and might exaggerate associations via meta-regression. Also, in this collection of studies the different assumptions for the underlying risks across trials affected the estimation of the regression coefficients. Where no prior belief exists for the plausibility of each assumption the different suggested models may be employed as a sensitivity analysis.

Finally, despite the importance of baseline characteristics in synthesising the studies of a systematic review, it is unclear whether investigators regularly incorporate them in their analyses. Protocols for meta-analyses should carefully consider the targeted population and include meta-regression with baseline severity or underlying risk among their additional analyses when baseline differences are expected between the eligible studies.



  • Funding AC received funding from Greek national funds through the Operational Program ‘Education and Lifelong Learning’ of the National Strategic Reference Framework (NSRF)—Research Funding Program: ARISTEIA. Investing in knowledge society through the European Social Fund.

  • Competing interests None.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.