Statistics from Altmetric.com
Systematic reviews and meta-analyses are often considered a reliable source of evidence to inform decisions about the effectiveness and safety of competing interventions.1 The validity of the findings from a meta-analysis depends on several factors, such as the completeness of the systematic review, the plausibility of the assumptions made, the risk of bias in the individual studies and the potential for reporting biases. In this paper we focus on the statistical considerations involved in the meta-analysis process and we analyse an example from mental health in Stata.2 The theoretical and conceptual considerations of the methods we implement have been covered in recently published articles3–5 and we suggest using these papers as companions when reading this manuscript. More specifically, in this paper we present Stata commands:
To conduct a fixed or a random effects meta-analysis. Prior to carrying out the statistical analyses, meta-analysts should consider the appropriate model (either fixed or random effects) for the specific clinical setting and outcomes of interest and then interpret the result under the light of the magnitude of the between-studies variability (heterogeneity)3 ,6;
To account for missing outcome data. Participants with missing outcome data may affect both the precision and the magnitude of the meta-analytic summary effect; the latter can occur when the probability of missingness is related to the effectiveness of the interventions being compared5;
To explore and account for publication bias and small-study effects.4 Publication bias occurs when publication of research results depends on their nature and direction.7 Failure to account for the unpublished studies may lead to biased summary estimates in favour of one of the two competing treatments (ie, usually the active or the newer intervention).8
Methods and Stata routines
In the following sections
We provide an example of fixed and random effects meta-analysis using the metan command.9
We use the metamiss command10 to explore the impact of different assumptions about the mechanism of missing data on the summary effect.
As a working example, we use a systematic review that comprises 17 trials comparing haloperidol and placebo for the treatment of symptoms in schizophrenia. This dataset has been previously used to evaluate the impact of missing data on clinical outcomes15 and is originally based on a Cochrane review.16 The outcome of interest is clinical improvement and risk ratios (RRs) larger than 1 favour haloperidol over placebo. From each trial we have the following information (table 1):
Number of participants who responded in the placebo arm (variable rp) and in the haloperidol arm (rh);
Number of participants who did not respond in either arm (fp, fh);
Number of participants who dropped out and whose outcomes are missing (mp, mh).
Performing fixed and random effects meta-analysis and measuring heterogeneity
Meta-analysis in Stata can be performed using the metan command.
For dichotomous data, the metan command needs four input variables:
. metan rh fh rp fp
Typing this, the software gives you the summary RR of haloperidol versus placebo using the fixed effect model according to Mantel-Haenszel weights.17 The inverse-variance weights can be specified via the option fixedi or randomi for a fixed or random effects analysis, respectively. Changing the estimated effect size is possible by specifying the options or for OR and rd for risk difference. The option by() allows definition of a grouping variable for the included studies and runs a subgroup analysis.
For continuous data, six input variables are necessary: the total number of participants in each arm, the mean values and the SD for each arm. The option nostandard switches the estimated effect measure from standardised mean difference to mean difference.
The command gives information on the presence and magnitude of statistical heterogeneity via the Q-test, the I2 measure and the estimate of the heterogeneity variance τ2 (using the ‘method of moments estimator’), which are provided in the output results. Although the estimated I2 and τ2 are routinely reported as fixed values, they are not free of uncertainty around the mean estimate. The CI for the I2 measure can be derived using the command heterogi, which requires the input of the Q-statistic of the meta-analysis and the corresponding degrees of freedom (df, the number of studies minus one):
. heterogi Q df(Q)
To date the metan command does not provide a CI for the magnitude of heterogeneity (τ2). However, it allows assessing the impact of heterogeneity on the summary effect via the predictive interval, which is the interval within which the effect of a future study is expected to lie.18 The predictive interval expresses the additional uncertainty induced in the estimates of future studies due to the heterogeneity and can be estimated by adding the rfdist option in metan (under the random effects model).
Many additional options are available (eg, options that handle the appearance of the forest plot), which can be found at the help file of the command (by typing ‘help metan’).
Exploring the impact of missing outcome data
Stata has a readily available command called metamiss that enables incorporation of different assumptions for the mechanism of missing outcome data in a meta-analysis. To date the metamiss command can be applied only for dichotomous data but is currently being extended to account for continuous outcomes.19 The syntax is similar to the metan command but requires also the number of participants that dropped out from each arm (ie, six input variables are necessary) as well as the desired method of imputing information for the missing data:
. metamiss rh fh mh rp fp mp, imputation_method
In general, we can assume the following scenarios:
An available case analysis (ACA), which ignores the missing data (option aca) and justifies the missing at random assumption;
The best-case scenario, which imputes all missing participants in the experimental group as successes and in the control group as failures (option icab);
The worst-case scenario, which is the opposite of the best-case scenario (option icaw)
The two previous approaches are naïve imputation methods since they do not account properly for the uncertainty in the imputed missing data. Methods that take into account the uncertainty in the imputed data include:
The Gamble-Hollis analysis,20 which inflates the uncertainty of the studies using the results from the best-case and worst-case analyses (option gamblehollis);
The informative missingness OR (IMOR) model,15 ,21 which relates within each study arm the results from observed and missing participants (options imor () or logimor ()) allowing for uncertainty in the assumed association (sdlogimor ()).
Note that the metamiss command always assumes that the outcome is beneficial; hence for a harmful outcome (eg, adverse events) the options icab and icaw will give the worst-case and best-case scenario, respectively. If it is not possible to assume that missing data are missing at random, the IMOR model is the most appropriate method because it takes the uncertainty of imputed data into account.5 This model uses a parameter that relates the odds of the outcome in the missing data to the odds of the outcome in the observed data. If this parameter cannot be informed by expert opinion, it is prudent to conduct a sensitivity analysis assuming various values (eg, if we set the odds of the outcome in the missing data to be twice as much as the odds in the observed data for treatment as well as control groups, we type metamiss rh fh mh rp fp mp, imor (2)).
Assessing the presence of small study effects and the risk of publication bias
The available approaches for assessing the risk of publication bias in a meta-analysis can be broadly classified into two categories: (1) methods based on associating effect sizes to their precision and (2) selection models. We focus on the first group of methods, which have been implemented in Stata via the commands metafunnel, confunnel, metatrim and metabias. However, researchers should always remember that this approach gives information about the presence of small study effects, which might or might not be associated with genuine publication bias.4
The command metafunnel draws the standard funnel plot22 and requires two input variables. Let logRR and selogRR be the two variables containing the observed effect sizes in studies and their SEs. The syntax of the metafunnel command would be:
. metafunnel logRR selogRR
The option by () can be added to display the studies in subgroups (using different shapes and colors) according to a grouping variable. A limitation of the standard funnel plot is that it does not explain whether the apparent asymmetry is due to publication bias or other reasons, such as genuine heterogeneity between small and large studies or differences in the baseline risk of the participants.4 Contour-enhanced funnel plots can be used instead where shaded areas have been added in the graph to indicate whether missing studies lie in the areas of statistical significance (eg, p value <0.05).23 If non-significant studies have been published, it is unlikely that the asymmetry is due to publication bias. The command confunnel can be employed to produce this modified funnel plot using the same syntax with the metafunnel command:
. confunnel logRR selogRR
The measure of study precision plotted in the vertical axis (eg, the variance instead of the SE) can be modified via the option metric (), while the option extraplot () allows the incorporation of additional graphs (such as regression lines, alternative scatterplots, etc) using standard Stata commands.
Alternatives to funnel plot have also been implemented in Stata.22 More specifically, regression models that consider the magnitude of effect in a trial to be related to its precision are very popular. The command metabias can fit four different regression models; these are the Egger's test22 (option egger), the Harbord's test24 (option harbord), the Peter's test25 (option peter) and the rank correlation test by Begg and Mazumdar26 (option begg). For a generic approach where the study effect sizes (effect) are regressed on their standard errors (se)
. metabias effect se, model
or for dichotomous data
. metabias rh fh rp fp, model
where model defines one of the four models described above. Adding the option graph also gives a graphical depiction of the results. Note that the regression line estimated by the Egger's test can also be added to the funnel plot by adding the option egger in the metafunnel command.
The trim-and-fill method aims to estimate the summary effect as if the funnel plot were symmetrical assuming that publication bias is the only explanation of asymmetry. The method can be applied using the metatrim command with the following syntax:
. metatrim effect se
Specifying the option funnel in metatrim gives the estimated filled funnel plot that includes both published and unpublished studies.
A Stata script that produces all results described below can be found online at http://missoptima.project.uoi.gr/index.php/our-research-projects.
Fixed and random effects meta-analysis
We fitted fixed effect as well as random effects models for illustration purposes. Using the metan command, we carried out ACAs for both models and produced the forest plot of figure 1. It is generally misleading to focus on the diamond when interpreting the results of a random effects meta-analysis; for example, in the presence of excessive heterogeneity the diamond is often meaningless.
According to figure 1, both models suggested that haloperidol was statistically significantly more effective than placebo in the treatment of schizophrenia and, as expected, the random effects analysis produced wider CI. Despite this finding, the estimated predictive interval crossed the line of no effect implying that in a future study placebo might appear to be more effective than the active drug. The study-specific estimates seemed substantially heterogeneous (eg, the CIs of the following studies, Bechelli 1983 and Beasley 1996 did not overlap); hence the fixed effect assumption might not be plausible for this dataset. This is supported by the Q-test, which suggested the presence of heterogeneity (p=0.038). The mean of the I2 measure, which measures the amount of heterogeneity across studies, suggested the presence of low heterogeneity (41%). Using the heterogi command we estimated the CI for the I2, which ranged from 0% to 67% implying that heterogeneity was potentially null to large, but not excessive.
The two models did not differ only in the level of uncertainty but also with respect to the magnitude of the summary effect. This is very common when there is a small-study effect (ie, there is an association between effect size and study size) because the random effects model assigns relatively larger weights to smaller studies.4 Indeed, in our example smaller studies (ie, studies corresponding to smaller squares in figure 1) gave more favourable results for haloperidol, whereas larger studies were closer to the null effect.
Impact of missing outcome data
We first fitted a subgroup analysis (using metan with the by () option) to investigate whether studies with and without missing data (in both arms) produced different results. This analysis was based only on the observed data and thus, in studies with missing data, the sample size was less than the number of randomised participants. A common misconception about subgroup analyses is that results differ between subgroups when the summary effect for one subgroup is statistically significant and not for the other. However, inference on subgroup differences should be based on an interaction test (ie, the test for subgroup differences implemented also in RevMan—http://tech.cochrane.org/revman) that compares statistically the two subgroup means accounting for their uncertainty. Differences between subgroups can be also identified visually by looking at the overlap of the CIs in their summary estimates.17
In figure 2, trials without missing data gave more favourable results for haloperidol than trials with missing data. This disagreement was also statistically significant since p value for the overall test for heterogeneity between subgroups (provided in the output of metan under the fixed effect model) was equal to 0.001. Hence, it is likely that missing data in trials substantially affected the results; a possible explanation is that there was a high dropout rate in the placebo arm due to lack of efficacy, which is quite common in trials in psychiatry.
We further explored the impact of missing data by incorporating in the analysis different assumptions about the mechanism of missingness. We presented the results from the random effects model (figure 3) and focused on the differences in the summary effects across the different scenarios. Under all six analyses haloperidol appeared to work better than placebo for schizophrenia. Small differences existed in the point estimates between the IMOR models and the Gable-Hollis analysis compared to the ACA. Unlike the rest of the methods these two approaches do not impute data and do not artificially inflate the sample size. The IMOR model increased uncertainty within studies which, in turn, resulted in a slight reduction of heterogeneity. The changes in the summary estimate were negligible. Under the ACA analysis studies with large missing rates favoured placebo (figure 1). The IMOR models down-weighted these studies and the mean summary estimate moved slightly towards the direction of the active intervention.
Publication bias and small-study effects
Different summary estimates between fixed and random effects models (figure 1) raised concerns that small-study effects possibly operated in our example, questioning the correct interpretation of the overall effect. To explore this apparent association between effect size and study size, we employed a series of graphical approaches and statistical tests (it is important to note that all these methods have low power and at least 10 studies are needed to draw conclusions).17
The funnel plot in figure 4 was rather asymmetric and showed that smaller studies tended to give results emphasising the effectiveness of haloperidol. The contour-enhanced funnel plot (figure 5) helped us distinguish between publication bias and other causes of the asymmetry. It showed that small studies were found not only in the areas of statistical significance (shaded area) but also in areas of non-statistical significance (white area); hence asymmetry might have been caused by several factors and not solely by publication bias. To assess the magnitude and statistical significance of the relationship between observed effect sizes and the size of studies we ran the Egger's meta-regression model (table 2). The test suggested that smaller studies tended to give different results if compared with larger trials, as the CI of the intercept did not include the zero value.
We finally applied the trim-and-fill method, although the assumption that funnel plot asymmetry was solely caused by publication bias probably did not hold for this dataset. The addition of the nine estimated unpublished studies slightly moved the summary estimate of the fixed effect model closer to 0 (figure 6), while the random effects model (table 3) resulted in a non-significant summary estimate (p=0.083).
Along with the rapid methodological development of meta-analysis, a variety of relevant software options have been made available enabling the application of different models and the exploration of characteristics that may affect the results. Using a working example, in this paper we offered a brief tutorial to researchers and interested clinicians about the use of Stata in meta-analysis, highlighting common pitfalls in the interpretation of results (more information about Stata can be found elsewhere).27 Our findings suggested that the presence of important small-study effects as well as the missing outcome data in some trials made the estimated summary effect not representative for the entire set of studies. Including in the meta-analysis only studies with data for all randomised participants was not the recommended approach, since the bulk of evidence came from trials with missing outcome data.
Clinical insight for the outcomes and treatments of interest is necessary to make reasonable assumptions for the mechanism of missing data and inform the choice of the appropriate statistical model. The results from the three models that accounted for the uncertainty in the imputed missing data (Gamble-Hollis and the two IMOR models) were similar and probably are the most accurate estimates of the summary RR. However, the fact small and large trials gave different results needs further exploration. For example, if the size of the studies was associated with differences in population characteristics with respect to some effect modifiers, then there might not be a common RR applicable to all populations.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.