Article Text

## Abstract

**Objective** The current practice in meta-analysis of the effects of psychopharmacological interventions ignors the administered dose or restricts the analysis in a dose range. This may introduce unnecessary uncertainty and heterogeneity. Methods have been developed to integrate the dose–effect models in meta-analysis.

**Methods** We describe the two-stage and the one-stage models to conduct a dose–effect meta-analysis using common or random effects methods. We illustrate the methods on a dataset of selective serotonin reuptake inhibitor antidepressants. The dataset comprises 60 randomised controlled trials. The dose–effect is measured on an odds ratio scale and is modelled using restricted cubic splines to detect departure from linearity.

**Results** The estimated summary curve indicates that the probability of response increases up to 30 mg/day of fluoxetine-equivalent which results in reaching 50% probability to respond. Beyond 40 mg/day, no further increase in the response is observed. The one-stage model includes all studies, resulting in slightly less uncertainty than the two-stage model where only part of the data is analysed.

**Conclusions** The dose–effect meta-analysis enables clinicians to understand how the effect of a drug changes as a function of its dose. Such analysis should be conducted in practice using the one-stage model that incorporates evidence from all available studies.

## Statistics from Altmetric.com

## Introduction

Synthesis of evidence provided by randomised controlled trials (RCTs) is commonly used to develop clinical guidelines and make reimbursement decision for pharmacological interventions. While the dose of a drug is of central importance, meta-analyses that examine their efficacy and safety often focus on comparing only agents or classes of drugs, ignoring potential variability due to different doses. As different dose schedules may result in considerable heterogeneity in efficacy and safety, one common approach is to restrict the database at certain dose range (e.g., the therapeutic dose), discard all studies outside that range and then examine the role of dose in a subgroup analysis for the lowest and the highest dose categories.1 This approach fails, however, to synthesise the whole relevant evidence. Alternatively, researchers might opt to perform many meta-analyses, each restricted to studies that examine a particular drug-dose combination. This will inevitably result in many underpowered meta-analyses.

In this paper, we present a recently developed evidence synthesis method of a dose–effect meta-analysis (DE-MA) approach that offers a middle ground between ‘lumping’ all doses together into a single meta-analysis and ‘splitting’ them to many dose-specific meta-analyses. In DE-MA, we model the changes in the drug effect along the range of all studied dosages. There are two common approaches to conduct DE-MA: two-stage and one-stage models. In the two-stage model, the dose–effect curve is estimated within each study and then synthesised across studies.2 3 These two steps are performed simultaneously in the one-stage model.4

We first provide the statistical explanations of the two models, and then illustrate the models by using a collection of RCTs examining the efficacy of selective serotonin reuptake inhibitors (SSRI) antidepressants.5

The analysis is implemented in R6 and is made available along with dataset and the results on GitHub (https://github.com/htx-r/Dose-effect-MA-EBMH-article-).

## Methods

In this section, we describe the two-stage DE-MA model with summarised data. Then we present briefly the one-stage model. Finally, we discuss other issues related to this topic, namely; statistical testing of dose–effect coefficients and how to assess heterogeneity and make predictions. The models which are illustrated here to conduct DE-MA have been implemented in various software packages, for example, the *drmeta* command (in Stata7) and the *dosresmeta* package8 (in R).6

### Dose–effect shape within a study

Let us consider the case of an RCT where several doses are examined (one dose per arm) denoted by
where the index *j* enumerates the dose levels starting with zero. The outcome is measured in each arm on an additive scale (e.g., a mean, a log-odds). The dose–effect model within a study associates the change in the outcome (ie, the treatment effect) to the change in the dose. Let us assume a trial like the one presented in table 1 that has a placebo arm, a dichotomous outcome and the changes in the outcome are measured using the odds ratio (logOR) of each dose level *j* relevant to a reference dose
. Using the placebo arm as a reference (at dose
, and assuming a linear association between logOR and dose, the dose–effect model is

The estimated coefficient β shows how much an increase in the dose will impact on the change in logOR.

Typically, the referent dose is assigned to the zero or the minimal dose to make interpretation easier. The doses are centred around the referent dose so the relationship quantifies the change in relative effects. However, this centralisation induces correlation between the logORs in each study (as they are all estimated relative to the outcome of the . Such correlations should be estimated and accounted for using the Longnecker and Greenland method.2 9

In practice, multiple changes in the dose–effect shape are expected so that the linear model is not often a realistic assumption. More flexible models are needed to account for those changes10 such as restricted cubic spline (RCS). RCS is a piecewise function; the dose spectrum is split into intervals (using some changepoints, called knots) and in each interval a cubic polynomial is fitted.11 Restrictions in the estimation of the polynomial coefficients are then imposed to ensure that they are connected and forming a smooth function which is linear in the two tails. The location and the number of those knots determine the shape of the RCS: the locations indicate intervals where changes in the shape might occur, and the number reflects how many such changes are anticipated. In general, setting k knots creates a RCS model with regression coefficients. For identifiability, the minimum number of knots is three and the dose–effect shape is:

This function is a combination of linear and non linear transformations.11

Of note, a two-stage approach requires that the study examines at least three dose-level data including the referent level and that enables estimating the two regression coefficients in the linear and spline (nonlinear, ) parts of the equation.

Any type of function could be used in the dose–effect association. For study indicator *i,* the general form of the dose–effect model can be written:

The term refers to the p dose–effect parameter and f denotes the dose–effect shape.

### Synthesis of dose–effect shapes across studies

Consider that we have fit the RCS model in *k* studies and we have obtained *k* sets of estimates (
). Each pair of coefficients represents the shape of the dose–effect within each study. Now, we synthesise the shapes across studies by combining their coefficients. We may set a common underlying coefficient for all studies, for example,
and
(common-effect model). Alternatively, the underlying study-specific coefficients can be assigned a two-dimensional normal distribution with mean
and a variance–covariance matrix to reflect the heterogeneity across the studies (random-effects model). In the general case, the dose–effect shape
f
involving
p
coefficients
which are similarly synthesised using a multivariate normal distribution.

What we describe above is the two-stage approach: the dose–effect curves are estimated within each study and then synthesised across studies in two separate steps. This requires each study to report non-referent doses at least as many as the number of the dose–effect coefficients. Otherwise, the coefficients will be non-identifiable and the study should be excluded from the analysis. For example, to estimate a dose–effect quadratic shape or a RCS with three knots, two coefficients need to be estimated and hence each study needs to report at least two logORs (which means at least three dose levels). Studies that report less dose levels, shall be excluded from the synthesis.

In the one-stage approach, within and across study estimation of the shape are performed simultaneously.4 This allows for borrowing information across studies and the study-specific coefficients can be estimated even if the study itself does not report the required number of doses. This means that, with the one-stage approach, we can include in the synthesis studies that report only one logOR (two dose levels) even if we want to estimate RCS.

There are different ways to present the results from the DE-MAs. The dose–effect shape as a function of any dose can be presented in graphical or tabular form by plugging-in the dose values and the estimated coefficients in the assumed function (see figures 1 and 2). Another useful presentation of the results could be to show absolute estimates of the outcome, such as estimates of probability for efficacy at any given dose, see figure 3. This can be done in two simple steps. First, we estimate the absolute probability of the response at the reference dose (e.g., zero) and then we combine this with the estimated relative treatment effect at each dose (e.g., with the estimated logOR) to obtain the absolute outcome (e.g., the probability to respond at an active dose level).

### Statistical testing of the dose–effect shape

The hypothesis of no dose-effect association, that is where B is a vector of all regression coefficients, can be tested by computing a Wald statistic based on estimated regression coefficients (and their estimated variances/covariances). A p-value is then derived with reference to a chi-squared distribution with degrees of freedom equal to the number of regression coefficients involved in the null hypothesis.

Alternatively, we may compute the z statistic to test each dose-effect coefficient, under the hypothesis . Testing the coefficient of the spline term will indicate whether a linear function is sufficient to describe the data.

As with every statistical test, test results should be interpreted with cautious and considering common fallacies and misinterpretations of the p-value.12 Furthermore, confidence limits, typically 95%, for the unknown summary dose–effect shape can be estimated from the model for any sensible value of the dose.

### Heterogeneity

Heterogeneity in the study-specific coefficients
introduces heterogeneity in the relative treatment effects, which is what we will call heterogeneity from now on. It is a function of the dose and can be measured by the variance partition coefficient (VPC).4 The VPC is a study-specific and dose-specific which shows the percentage of heterogeneity out of the total variability specific to the study. VPC can be computed for each non-referent dose in each study. An average of the study-specific VPCs by dose level could be seen as a dose-specific I^{2}. It is useful to plot the study-specific VPCs (as %) against the dose levels to gauge the level of heterogeneity.

## Results

We illustrate the models by re-analysing a dataset about the role of dose in the efficacy of SSRIs. Drug-specific doses are converted into fluoxetine-equivalents (mg/day) using a validated formula.5 The outcome is response to treatment defined as 50% reduction in symptoms. The data include 60 RCTs, which recruited 15 174 participants in 145 different dose arms (see online supplemental appendix figure 1, 2 and table 1).

### Supplemental material

### Dose–effect model within a study

To exemplify the process, we consider the study by Feighner *et al*.13 Table 1 presents the data at the five examined dose arms. The four logORs are estimated as the odds of each non-referent category (10, 20, 40, 60 mg/day) relative to the odds in the referent dose (Placebo, 0 mg/day). The study-specific estimated logORs and their SEs can be used to fit a linear dose–effect model.

A log linear trend is then estimated based on the aggregate data presented by Feighner *et al* (figure 1).13 The Greenland and Longnecker method is used to back estimate the covariance of these four empirical logORs used as dependent variable of the linear dose–effect model.

The linear dose–effect coefficient is estimated at 0.0156 (95% CI 0.0083 to 0.0230) on the log scale. The OR at dose 10 to be which means OR increases by for a 10-unit increase in dose.

Biologically, it is quite unrealistic to assume a constant effect of fluoxetine-equivalents on the relative odds of the outcome. We expect the shape to increase up to a dose level and then flatten out. The exact value of the dose, at which the dose–effect model is levelling out, is unknown. And it would be good to specify a dose–effect model that is able to capture this plausible mechanism.

For this reason, we use a RCS function, rather than a linear function, for fluoxetine-equivalents. RCSs are generated using three knots at 20, 23.6 and 44.4 dose levels which represent the 10%, 50% and 90% percentiles, of the observed non-zero dose distribution. A Wald-test indicates large incompatibility between this study and the hypothesis of a linear function ( , p =0.033). Figure 1 indicates a large positive dose–effect up to 30 mg/day of fluoxetine-equivalents and no increase in the effect beyond that value.

The fact that the shape is estimated from just a single study results in a large uncertainty around the RCS curve.

### Synthesis of dose–effect shapes across studies

We first synthesise the dose–effect coefficients from all studies assuming a random-effects two-stage model. For RCS in the two-stage model, only 17 studies can be synthesised (those with at least three dose levels). The results are depicted in figure 2. The estimated linear coefficient at 0.0186 (95% CI 0.0118 to 0.0253) and the spline coefficient is −0.0628 (95% CI −0.0876 to −0.0379).

The random-effects one-stage model can include all 60 studies. The estimated linear and spline coefficients are very close to those from the two-stage model ( 0.0189 (95% CI 0.0146 to 0.0232) and −0.0621 (95% CI −0.0814 to −0.0428)) which is also shown in the agreement of the two shapes in figure 2. The important difference between the results from the two approaches is that the confidence bands are tighter from the one-stage due to including double as many studies as the two-stage approach does.

In figure 3, we show the probability of response as a function of the dose as estimated from the meta-analysis. After meta-analysing all placebo arms, the probability of response to placebo is estimated at 37.7% (dashed line in figure 3). Then, increase of the dose up to 30 mg/day of fluoxetine-equivalent results in 50% probability to respond. Beyond 40 mg/day, the probability of response flattens out.

For the two-stage and the one-stage models, the statistical hypothesis can be rejected with estimated p-values less than 0.001 for both the linear and spline coefficients. This can be seen as a statistical evidence that the linear model hypothesis is rejected, and the RCS is preferable with both the linear and the spline part. The hypothesis of no dose-effect association is not also accepted (p-value<0.001).

Figure 4 shows the variance partition component along with the observed doses. At dose 20 mg/day, the total variability that is attributed solely to heterogeneity ranges between 4% and 40%, which is considered to be moderate. Overall, the majority of VPC values does not exceed 60%.

## Discussion

Researchers can conduct a DE-MA by following two steps. The first step is to estimate a dose–effect curve within each study. The second step is to synthesise those curves across studies. These two steps can be performed either separately (two-stage model)2 3 or simultaneously (one-stage model).4 In this article, we detail these two models, alongside considerations for statistical testing of the dose–effect parameters, estimation of heterogeneity and presentation of the results. We use the presented models to re-analyse RCT data comparing various SSRIs in terms of response .

We describe the models for a dichotomous outcome and the effect size we used as odds ratio. However, the model can be adapted easily to other measures like risk ratio and hazard ratio. Likewise, the model can be employed with other data types such as continuous outcome with (standardised) mean differences.14

Recently, two extensions of the presented models have been introduced in the literature. The one-stage and two-stage models have been extended to a Bayesian setting15 to take advantage of its great flexibility. One of these advantages is to implement the exact binomial distribution for binary data, instead of the approximate normal distribution for the relative treatment effect in the frequentist settings. The assumption of a normal distribution can be hard to meet when the sample size is small as shown in recent simulations.15 The dose–effect model has been also extended to network meta-analysis which allows for modelling the dose–effect relationship simultaneously to more than two agents.16 17

Researchers should be careful when they report the findings of DE-MA and follow the existing reporting guidelines. Xu *et al* proposed a checklist with 33 reporting items for such analysis.18 The majority of these items (27) come from the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement after some modifications.19 The other six items are added from Meta-analyses Of Observational Studies in Epidemiology checklist to cover key considerations of observational studies.20 They used the proposed checklist to assess quality of reporting in the published DE-MAs. They found that while reporting in the introduction and results was on average good, further improvements are required in reporting methods. Xu and colleagues also studied the association between reporting quality and study characteristics. They observed that studies including more authors or methodologist have a better reporting quality. They conclude that while the quality of reporting has improved over the years, further refinement in the reporting checklists is required.

The main challenge in DE-MA is how to define the dose–effect shape. The shape selection can be guided by previous studies (such as dose-finding studies), clinical experience and biological plausibility informed by pharmacodynamic and pharmacokinetic studies. Additional evidence could be provided by considering the goodness of fitness measures of various shapes21 or via graphical inspection of the data. Yet, the RCS model has sufficient flexibility to capture different shapes. In our case study, using only three knots was sufficient to capture the expected drug behaviour SSRIs while requires only three dose levels to be reported in at least one study. This makes RCS an attractive choice for the majority of analyses.18 However, the number and location of knots should be chosen carefully based on the anticipated drug behaviour and the clinical knowledge.

Researchers may encounter additional challenges if observational studies are synthesised instead of RCTs as it was the case in this paper. First, defining the dependent and independent variables in observational studies could be difficult. For example, if we want to evaluate the association between the alcohol consumption and the use of tobacco, the shape will depend on whether alcohol is set as a dependent or independent variable. Second, categorisation of non-pharmacological exposures (such as environmental exposure, diet and so on), which are often the focus of observational studies, is often difficult. There might be open-ended categories to which assignment of a specific dose is not obvious (e.g., smoking two packages per day and above) and exposure categories might be differently defined across studies.22 23 These challenges could induce additional uncertainty in the analysis. In such cases, sensitivity analysis is recommended to investigate the robustness of the DE-MA results.

In conclusion, the DE-MA enables clinicians to understand how the effect of a drug changes as a function of its dose. Such analysis should be conducted in practice using the one-stage model that incorporates evidence from all available studies.

## Ethics statements

### Patient consent for publication

## References

## Supplementary materials

## Supplementary Data

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

## Footnotes

Twitter @Toshi_FRKW, @And_Cipriani

Contributors TAF and AC designed the study and collected the data. TH performed the statistical analysis and NO helped in a partial revision of the analysis. TH and GS drafted the manuscript. All authors performed a critical revision of the manuscript and approved the submitted version.

Funding This work was supported by HTx project for TH and GS. The HTx project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825 162. AC is supported by the National Institute for Health Research (NIHR) Oxford Cognitive Health Clinical Research Facility, by an NIHR Research Professorship (grant RP-2017–08-ST2-006), by the NIHR Oxford and Thames Valley Applied Research Collaboration and by the NIHR Oxford Health Biomedical Research Centre (grant BRC-1215–20005). The views expressed are those of the authors and not necessarily those of the UK National Health Service, the NIHR or the UK Department of Health.

Competing interests TAF reports grants and personal fees from Mitsubishi-Tanabe, personal fees from MSD, grants and personal fees from Shionogi, outside the submitted work. In addition, TAF has a patent 2020-548587concerning smartphone CBT apps pending, and intellectual properties for Kokoro-app licensed to Mitsubishi-Tanabe. AC has received research and consultancy fees from INCiPiT (Italian Network for Paediatric Trials), CARIPLO Foundation and Angelini Pharma.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.