Article Text

Download PDFPDF

Side effect profile and comparative tolerability of 21 antidepressants in the acute treatment of major depression in adults: protocol for a network meta-analysis
  1. Anneka Tomlinson1,
  2. Orestis Efthimiou2,
  3. Katharine Boaden3,
  4. Emma New3,
  5. Sarah Mather3,
  6. Georgia Salanti2,
  7. Hissei Imai4,
  8. Yusuke Ogawa5,
  9. Aran Tajika6,
  10. Sanae Kishimoto4,
  11. Sino Kikuchi4,
  12. Astrid Chevance7,8,
  13. Toshi A Furukawa4,
  14. Andrea Cipriani1,3
  1. 1 Department of Psychiatry, University of Oxford, Oxford, UK
  2. 2 Institute of Social and Preventive Medicine, Bern, Switzerland
  3. 3 Oxford Health NHS Foundation Trust, Warneford Hospital, Oxford
  4. 4 Department of Health Promotion and Human Behavior, Kyoto University Graduate School of Medicine/School of Public Health, Kyoto, Japan
  5. 5 Department of Healthcare Epidemiology, Kyoto University Graduate School of Medicine, Kyoto, Japan
  6. 6 Department of Psychiatry, Kyoto University Hospital, Kyoto, Japan
  7. 7 Paris Descartes University, Paris, France
  8. 8 METHODS Team, Center for Research in Epidemiology and Statistics, Sorbonne Paris Cité, Paris, France
  1. Correspondence to Dr Anneka Tomlinson, Department of Psychiatry, Warneford Hospital, University of Oxford, Oxford OX3 7JX, UK; anneka.tomlinson{at}


Introduction We have recently compared all second-generation as well as selected first-generation antidepressants in terms of efficacy and acceptability in the acute treatment of major depression. Here we present a protocol for a network meta-analysis aimed at extending these results, updating the evidence base and comparing all second-generation as well as selected first-generation antidepressants in terms of specific adverse events and tolerability in the acute treatment of major depression in adults.

Methods and analysis We will include all double-blind randomised controlled trials comparing one active drug with another or with placebo in the acute treatment major depression in adults. We will compare the following active agents: agomelatine, amitriptyline, bupropion, citalopram, clomipramine, desvenlafaxine, duloxetine, escitalopram, fluoxetine, fluvoxamine, levomilnacipran, milnacipran, mirtazapine, nefazodone, paroxetine, reboxetine, sertraline, trazodone, venlafaxine, vilazodone and vortioxetine. The main outcomes will include the total number of patients experiencing specific adverse events; experiencing serious adverse events; and experiencing at least one adverse event. Published and unpublished studies will be retrieved through relevant database searches, trial registries and websites; reference selection and data extraction will be completed by at least two independent reviewers. For each outcome we will undertake a network meta-analysis to synthesise all evidence. We will use local and global methods to evaluate consistency. We will perform all analyses in R. We will assess the quality of evidence contributing to network estimates with the Confidence in Network Meta-Analysis web application.

Discussion This work will provide an in- depth analysis and an insight into the specific adverse events of individual antidepressants.

Ethics and dissemination This review does not require ethical approval.

PROSPERO registration number CRD42019128141.

  • adult psychiatry

Statistics from


Depression affects 350 million people worldwide and it is the second leading cause of global disease burden.1 The high direct and indirect costs for major depression are substantially due to significant deficits in treatment provision. There are a number of efficacious pharmacological and non-pharmacological interventions for depression, however a significant proportion of patients with major depression remain inadequately treated. Antidepressants are widely prescribed across the world in both primary and secondary care; however, poor adherence and premature discontinuation of antidepressant medication contribute to suboptimal clinical outcomes. Up to one-third of patients discontinue antidepressants due to adverse effects and this is a major barrier to antidepressant treatment.

Our recent Group of Researchers Investigating Specific Efficacy of Individual Drugs for Acute Depression (GRISELDA) project reported that the acceptability of antidepressants and dropouts due to adverse events vary between drugs and the withdrawal rates tend to be higher than placebo.2 This current network meta-analysis (NMA) is the completion of the GRISELDA project and is based on the same protocol (that have the same PROSPERO registration number, CRD42019128141).3 We have designed this NMA to investigate the profile of specific adverse events for each antidepressant. This will contribute to a better understanding of how to use antidepressants in the treatment of depression in adults.4

The objective of this NMA is to compare the specific side effects and the overall tolerability of all second-generation antidepressants and selected first-generation antidepressants in the acute treatment of major depressive disorder in adults. The project is called Meta-Analysis of Relative Tolerability and Harms of Antidepressants.

Methods and analysis

Types of studies

We will include double-blind randomised controlled trials (RCT) comparing one active drug with another or with placebo, as monotherapy, in the acute phase treatment of major depression. Cross-over and cluster randomised trials will be included, while quasirandomised trials will be excluded. For cross-over studies, to address concerns around possible ‘carry over’ effects, we will use data from the precross-over phase.3

Types of participants

Patients aged 18 years or older, of both sexes, with a primary diagnosis of unipolar major depression according to standard operationalised diagnostic criteria, such as Feighner criteria, Research Diagnostic Criteria, Diagnostic and Statistical Manual of Mental Disorders, Third Edition (DSM-III), DSM-III-R, DSM-IV, DSM-5, International Classification of Disease, 10th Revision (ICD-10) and ICD-11, will be included. Studies in which 20% or more of the participants may be suffering from bipolar or psychotic depression will be excluded. A concurrent secondary diagnosis of another psychiatric disorder will not be considered as exclusion criterion, but RCTs in which all participants have a concurrent primary diagnosis of another mental disorder or concomitant medical disorder will be excluded. Antidepressant trials in depressive patients with a serious concomitant medical illness, postpartum or treatment resistant depression will be excluded.

Types of interventions

We will include the following antidepressants: agomelatine, amitriptyline, bupropion, citalopram, clomipramine, desvenlafaxine, duloxetine, escitalopram, fluoxetine, fluvoxamine, levomilnacipran, milnacipran, mirtazapine, nefazodone, paroxetine, reboxetine, sertraline, trazodone, venlafaxine, vilazodone and vortioxetine (see GRISELDA protocol for more details).3 Rescue medications will be allowed if equally provided among the randomised arms. We will include only studies randomising patients to drugs within their licensed dose range.3 5 We anticipate that any patient who meets all inclusion criteria could, in principle, be randomised to receive any of the interventions in the synthesis comparator set (assumption of transitivity).

Outcome measures and categorisation of adverse events

Tolerability will be evaluated using the following outcome measures:

  1. Total number of patients experiencing one specific adverse event.

  2. Total number of patients experiencing serious adverse events.

  3. Total number of patients experiencing at least one adverse event.

Two independent researchers will extract all adverse effects reported in the trials (paying careful attention not to double-count events) and will then use preferred terms from MedDRA ( to categorise each adverse event. MedDRA has been developed by the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use to provide a single standardised international medical terminology which can be used for regulatory communication and evaluation of data pertaining to medicinal products for human use. As a result, MedDRA is designed for use in the registration, documentation and safety monitoring of medicinal products through all phases of the development cycle (ie, from clinical trials to postmarketing surveillance).

There are five levels to the MedDRA hierarchy, arranged from very specific to very general. At the most specific level, called ‘Lowest Level Terms’ (LLT), there are more than 70 000 terms which parallel how information is communicated. These LLTs reflect how an observation might be reported in practice (ie, in a specific study). This level directly supports assigning MedDRA terms within a user database. Each member of the next level, ‘Preferred Terms’ (PT), is a distinct descriptor (single medical concept) for a symptom, sign or disease diagnosis. Each LLT is linked to only one PT and each PT has at least one LLT (itself) as well as synonyms and lexical variants (eg, abbreviations, different word order). If we find different MedDRA terms to identify similar adverse events, these synonyms will be merged using clinical judgement into broader categories (as applicable) and validated by another clinician. Any discrepancies will be solved by consensus within the review team.

To define serious adverse events, we will use the classification employed by the US Food and Drug Administration (

  • Results in death.

  • Life threatening.

  • Requires inpatient hospitalisation or causes prolongation of existing hospitalisation.

  • Results in persistent or significant disability/incapacity.

  • May have caused a congenital anomaly/birth defect.

  • Requires intervention to prevent permanent impairment or damage.

All serious adverse effects will be included in the meta-analysis.

Common and very common adverse events

We will also identify common and very common non-serious adverse events using the approved definition of frequency of adverse event issued by the Council of International Organizations of Medical Sciences:

Type of adverse eventFrequency (%)
Very common≥10
Common≥1 and <10
Uncommon≥0.1 and <1
Rare or very rare<0.1


If the number of common and very common adverse effects is over 20, a survey including patients and clinicians’ perspective will be carried out to select which adverse effects should be considered for use in the statistical analyses.


We will include two types of participants:

  • Patients: any individual over 18 years old with a current/previous episode of unipolar depression and current/previous use of antidepressants.

  • Prescribing clinicians: any healthcare professional (psychiatrist, general practitioner, prescribing nurse or prescribing pharmacist) with personal experience in prescribing and monitoring antidepressants in depression.

The questionnaire will be in English, French and German. We aim to recruit at least 200 patients and 100 physicians from multiple countries to increase the external validity of the findings.

The survey will collect data about the following aspects:

Sociodemographic characteristics and health status of participants

For patients: sex, age, country of residency, number of years of education, income, Patient Health Questionnaire-9, diagnosis, setting of care, number and names of antidepressants ever taken, duration of treatment, suicidal behaviour, length of current episode, total length of exposure to antidepressants.

For prescribing clinicians: sex, age, country of practice, profession, duration of clinical experience, workplace, personal experience of depression/antidepressants.

Ranking of the adverse events

Each participant will be asked to rank the adverse event according to their personal preference. The list presented to patients will contain only clinical adverse events that can be understood by laypeople, whereas the list of adverse events for clinicians will also include biological measures (for instance, liver function or glucose blood levels). For patients, we will use the specific ‘patient-friendly’ wording of MedDRA, and for clinicians the MedDRA terminology.

A modified Q-sort method will be used to rank the adverse events and the final list of adverse events will include all of the serious adverse events plus the 20 most important non-serious adverse events. If appropriate, the researchers will review the lists of adverse events generated from the survey and include any further adverse events considered to be clinically relevant.

Search strategy and study selection

We will use the same search strategy that we used before for GRISELDA3 and perform an update of the search. The reference selection process will be done by two researchers independently. Any disagreements will be resolved via discussion with a third member of the review team.

Data extraction

Two reviewers will independently extract from the included studies the relevant information about specific adverse events using a predefined structured template. Any discrepancies will be discussed between the two reviewers and any unresolved discrepancies will be resolved by a third senior reviewer. When different values are provided in the published and unpublished studies, the unpublished data will be prioritised and extracted. Two review authors will ascertain that the data are entered correctly into the final data set.

Length of trial

We will consider the number of participants with adverse events in each treatment arm at 8 weeks.5 If information at 8 weeks is not available, we will use data ranging from 4 to 12 weeks (we will give preference to the time point closest to 8 weeks; if equidistant, we will take the longer outcome). Longer term studies will be included in the systematic review but excluded from the statistical synthesis of data if they do not provide data for the period of 4–12 weeks.

Comparability of dosages

We will include only study arms randomising patients to drugs within the licensed dose. Both fixed dose and flexible dose designs will be allowed.

Risk of bias assessment

We will assess risk of bias in the included studies using the tool described in the Cochrane Collaboration Handbook as a reference guide ( The assessment will be performed by two independent raters. If the raters disagree, the final rating will be made by consensus with the involvement (if necessary) of another member of the review group.

Statistical synthesis of study data

We will generate descriptive statistics for the trial, and study population characteristics across all eligible trials, describing the types of comparisons and some important variables, either clinical or methodological (such as year of publication, age, severity of illness, sponsorship and clinical setting). We will draw the network diagram to graphically present the available evidence.

Pairwise meta-analyses

For each pairwise comparison in the data set that is informed by 10 studies or more, we will synthesise data using a random-effects meta-analysis model, to obtain ORs and 95% CIs. This model assumes that true underlying treatment effects are similar, but not identical across the different study settings, and allows us to estimate heterogeneity.

One complication we expect to face is that for many specific adverse events we may have low or very low event rates in our data set. When the outcome is rare, that is, when there are studies with zero arms in one or both treatment arms, the inverse variance method for meta-analysis might lead to biased results ( In such cases we will also use the Mantel-Haenszel method to synthesise the evidence.6 This model avoids the use of the so-called ‘continuity correction’, which artificially imputes data and might bias the results. The model assumes a common (fixed) treatment effect, that is, does not include heterogeneity. This is a limitation of the approach, but, as the Cochrane Handbook suggests, incorporation of heterogeneity should be a secondary consideration when attempting to estimate treatment effects from sparse data ( In order to decide which method to use as our primary analysis for rare outcomes, we will fit a fixed-effects inverse variance and a Mantel-Haenszel model, and compare results. If the two approaches provide similar results, we will conclude that the continuity corrections have a minimal effect on the results of the inverse variance method. In that case, we will employ a random-effects inverse variance model as our primary analysis. If there are important discrepancies between the two approaches, we will only use the Mantel-Haenszel method.

Furthermore, when data are rare, the choice of model becomes important,7 and different models might give substantially different results. Thus, for the five most important rare outcomes according to our ranking, we will employ additional models (Peto OR, a Bayesian meta-analysis model with informative prior distributions for heterogeneity8 and a beta-binomial model, as seen fit according to the assumptions of the different models).7 This will allow us to assess the robustness of our findings under different model choices. If different models lead to substantially different results, we will present all results on equal grounds and we will refrain from drawing firm conclusions regarding relative treatment effects.

For all pairwise meta-analyses we will present forest plots. We will use a 0.5 continuity correction, in order to present in the plot studies with zero events in one of their arms. For studies with zero events in both arms we will not show any relative effects.

We will visually inspect the forest plots to identify any particularly heterogeneous comparisons. For the analyses where random-effects model will be used, we will compare the estimated SD of random effects with the corresponding empirical distribution.8 We will also report the I2 statistic and its 95% CI, as an additional measure of heterogeneity in the pairwise meta-analyses.

Assessment of the transitivity assumption of NMA

The key underlying assumption of NMA is the assumption of transitivity.9 10 In order to assess the validity of this assumption, we will investigate whether study-level characteristics that may impact on the relative treatment effects (ie, effect modifiers) are similarly distributed across treatment comparisons. Potential effect modifiers include clinical and demographic characteristics, such as age, gender, dose and severity of symptoms. We will group studies by treatment comparisons and obtain descriptive statistics regarding these important covariates. In case we find significant discrepancies in the corresponding distributions, we will limit our NMAs to studies that are sufficiently similar.

The clinical features, which have been demonstrated to date to moderate efficacy of antidepressants, include bipolarity,11 psychotic features12 and subthreshold depression.13 We have assured transitivity in our network with regard to these variables by limiting our samples to participants with non-psychotic unipolar major depression. Other clinical or methodological variables that may influence our primary outcomes of antidepressant efficacy or acceptability include age,14 depressive severity at baseline15 and the dosing schedule.16 We will investigate if these variables are similarly distributed across studies grouped by comparison.

Network meta-analyses

If we find no evidence against the transitivity assumption, we will synthesise the evidence using NMA.10 For non-rare outcomes we will use a random-effects NMA model17 fit in a frequentist setting, assuming a common heterogeneity parameter across all treatment comparisons. We will present the ‘league-table’ of results, that is, a table with all estimated treatment effects and the corresponding 95% CIs. For each outcome, in order to assess the extent of heterogeneity, we will compare the estimated value for the heterogeneity SD with the corresponding empirical distributions.8 In addition, we will present the prediction intervals for each drug versus placebo; this will allow us to gauge the effect of heterogeneity in the true underlying treatment effects of a future study. We will rank the various treatments for each outcome using the surface under the cumulative ranking curve.18

For rare outcomes, that is, when there are studies with zero events in some of their treatment arms, we will perform an NMA using a fixed-effects Mantel-Haenszel NMA approach19 and compare results with the fixed-effects inverse variance NMA model. If results agree, we will use the random-effects NMA model as our primary analysis.17 If we find important discrepancies we will only present results from the Mantel-Haenszel NMA approach. In sensitivity analyses we will also employ a NMA model with a non-central hypergeometric (NCH) likelihood.20 Both Mantel-Haenszel and NCH NMA can handle studies with zero events in one (but not all) of their treatment arms; in simulations we have shown that these two models perform well under sparse data settings. Notably, both models exclude studies with zero events in all treatment arms. Thus, for very rare outcomes, we expect that the network might become disconnected. In that case, we will perform NMAs in each of the corresponding subnetworks that include enough data to be meaningfully synthesised.

For some specific outcomes (ie, gastrointestinal side effects, neurological symptoms, and so on) we assume that the relative treatment effects of the various drugs versus placebo are exchangeable, that is, they follow an underlying common distribution. This is based on the assumption that the different drugs might have similar pathways to the outcome. For these outcomes, we will employ a Bayesian, multilevel hierarchical NMA model that assumes exchangeability of the treatment effects against a common comparator. This model has been shown both theoretically and in simulations to lead to an increase of the statistical power to detect treatment effects of drugs versus placebo, while automatically controlling for the possibility of multiple testing issues.21

Assessment of inconsistency

Inconsistency corresponds to the (statistical) disagreement between the different sources of evidence in a network.10 Assessment of inconsistency is an important part of NMA. It offers an additional quantitative method of exploring the validity of transitivity assumption.9 Large inconsistency implies a breach of transitivity, which in turn suggests that synthesising data in an NMA should be avoided.

We will use two different methods for assessing inconsistency in the network. The first one is a ‘global’ method, the design-by-treatment test.22 This is a test against the null hypothesis of overall consistency in the network. Subsequently, we will employ a ‘local’ method, ‘Separate Indirect from Direct Design Evidence’ (SIDDE).19 Using SIDDE, we group the studies by design (ie, according to the group of treatments they compare). Then, for each treatment comparison in each design, we estimate the direct evidence (from studies of this particular design) and indirect evidence (from the rest of the network). We then compare the two estimates; important differences will point to hot spots of inconsistency in the network.

If these methods suggest the presence of important inconsistency in the network, we will first try to scan our data for extraction errors. If none is found, we will revisit the studies to assess again the plausibility of the transitivity assumption, especially if some hot spots of inconsistency are identified using the SIDDE approach. If we identify possible reasons for this inconsistency we will account for it by performing subgroup analyses. If we cannot identify the cause of inconsistency, we will refrain from performing an NMA.

All methods for inconsistency, however, are expected to have low power in detecting breaches of the transitivity assumption. Especially for the case of rare outcomes (which we expect to have in our analyses), all tests for inconsistency are expected to be extremely low powered. In addition, absence of a statistically significant result in tests for inconsistency does not offer proof of transitivity. Thus, we aim to perform a thorough assessment of transitivity even in the absence of any proof of inconsistency.

Exploring heterogeneity and inconsistency and sensitivity analyses

We expect small amounts of heterogeneity and inconsistency to be present given the variety of study settings we plan to include. For the most common adverse events, we will explore whether treatment effects are robust in subgroup analyses and network meta-regression using the following characteristics: (1) study year; (2) sponsorship; (3) depressive severity at baseline; (4) dosing schedule; (5) head-to-head versus placebo-controlled studies; (6) single-centre versus multicentre studies.1 The sensitivity of our conclusions will be evaluated by analysing (1) only studies with balanced doses in all arms (ie, we will exclude studies with unfair dose comparisons); (2) only studies with unpublished data (ie, we will exclude studies providing published data only); (3) only studies with low risk of bias; and (4) only head-to-head studies.

Assessing small study effects, publication bias and reporting bias

It has been empirically shown that safety outcomes are in high risk of reporting bias,23 and that trials tend to systematically understate adverse events.24 This phenomenon might be more pronounced in placebo-controlled trials.2 In order to assess the existence of small study effects and publication biases, we will use funnel plots and contour-enhanced funnel plots.25 This will allow us to check whether the precision of the studies (which is directly related to sample size) correlates with the effect size. We will use the Harbord test26 to formally test for asymmetries in the funnel plots. We will follow this procedure for pairwise comparisons between antidepressant and placebo. If we identify an important association of the reported effect with the trials’ precision, we will try to adjust for it in a sensitivity analysis, by performing a network meta-regression with the trial precision as a study-level covariate. If there is strong evidence of small study effects or publication bias, we will clearly report it and interpret all results with caution.

Model implementation

We will fit all models in R. We will fit the pairwise meta-analysis models using the meta package.27 All frequentist NMAs will be fit using the netmeta package.28 We will perform all Bayesian analyses using the R2jags packages.29 For all Bayesian models we will assume a binomial likelihood for the number of events per treatment arm. We will employ uninformative prior distributions, for example, N (0, 1002) for all location parameters such as the log ORs of relative treatment effects. For the heterogeneity parameter we will employ the empirical distributions described elsewhere.8 We will run multiple chains and assess convergence and mixing of the chains using the Brooks-Gelman-Rubin diagnostic criterion.

Assessing the confidence of evidence of NMA

The quality of evidence obtained by the synthesis of the evidence for each outcome will be separately evaluated using the framework described in Salanti et al’s study30 and implemented using the Confidence in Network Meta-Analysis31 web application. This will allow to grade the confidence in the results into high, moderate, low or very low.


The adverse effects of the antidepressants and their perceived marginal efficacy are major factors contributing to the unsatisfactory treatment duration of antidepressants. These factors are exacerbated by our current inability to predict which drug will cause the fewest adverse effects for a specific patient, and which will work most effectively. This work will provide an in-depth analysis and an insight into the specific adverse events of individual antidepressants. This NMA is a key step in retrieving and understanding all of the information needed to guide the shared decision-making process between patients, carers and clinicians. It has been widely reported and recognised in the scientific literature.32 Matching patients to individual antidepressants, this will enable clinicians to precisely customise treatment to patients’ needs and thus improve their outcome.


AC is supported by an NIHR Research Professorship (grant RP-2017-08-ST2-006).



  • Contributors AC, AT, OE, GS and TAF drafted the manuscript. All other authors revised, edited and approved the final version of the protocol.

  • Funding This research was funded by the National Institute for Health Research (NIHR) Oxford Health Biomedical Research Centre (grant BRC-1215-20005).

  • Disclaimer The views expressed are those of the authors and not necessarily those of the UK National Health Service, the National Institute for Health Research, or the UK Department of Health. The funders had no role in the design and conduct of the study; or approval of the manuscript; and decision to submit the manuscript for publication.

  • Competing interests SK reports grants from the Mental Health Okamoto Memorial Foundation, Pfizer Health Research Foundation and KDDI Foundation outside the submitted work. IH reports lecture fees from Mitsubishi-Tanabe and Yoshitomi. TAF reports personal fees from Meiji, Mitsubishi-Tanabe, MSD and Pfizer and a grant from Mitsubishi-Tanabe, outside the submitted work; TAF has a pending patent 2018-177688. All other authors report no conflict of interest

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Patient consent for publication Not required.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.