Article Text

Download PDFPDF

Digital mental health
Psychiatric comorbid disorders of cognition: a machine learning approach using 1175 UK Biobank participants
  1. Chenlu Li1,
  2. Delia A Gheorghe2,
  3. John E Gallacher2,
  4. Sarah Bauermeister2
  1. 1Department Risk Advisory, Deloitte LLP, London, UK
  2. 2Department of Psychiatry, University of Oxford, Oxford, Oxfordshire, UK
  1. Correspondence to Dr Sarah Bauermeister, Department of Psychiatry, University of Oxford, Oxford OX3 7JX, UK; sarah.bauermeister{at}


Background Conceptualising comorbidity is complex and the term is used variously. Here, it is the coexistence of two or more diagnoses which might be defined as ‘chronic’ and, although they may be pathologically related, they may also act independently. Of interest here is the comorbidity of common psychiatric disorders and impaired cognition.

Objectives To examine whether anxiety and/or depression are/is important longitudinal predictors of cognitive change.

Methods UK Biobank participants used at three time points (n=502 664): baseline, first follow-up (n=20 257) and first imaging study (n=40 199). Participants with no missing data were 1175 participants aged 40–70 years, 41% women. Machine learning was applied and the main outcome measure of reaction time intraindividual variability (cognition) was used.

Findings Using the area under the receiver operating characteristic curve, the anxiety model achieves the best performance with an area under the curve (AUC) of 0.68, followed by the depression model with an AUC of 0.63. The cardiovascular and diabetes model, and the covariates model have weaker performance in predicting cognition, with an AUC of 0.60 and 0.56, respectively.

Conclusions Outcomes suggest that psychiatric disorders are more important comorbidities of long-term cognitive change than diabetes and cardiovascular disease, and demographic factors. Findings suggest that psychiatric disorders (anxiety and depression) may have a deleterious effect on long-term cognition and should be considered as an important comorbid disorder of cognitive decline.

Clinical implications Important predictive effects of poor mental health on longitudinal cognitive decline should be considered in secondary and also primary care.

  • psychiatry
  • anxiety disorders
  • depression & mood disorders

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:

View Full Text

Statistics from


Conceptualising comorbidity is complex and the term is used variously. Here, we refer to it as the coexistence of two or more diagnoses which might be defined as ‘chronic’ and, although they may be pathologically related to each other, they may also act independently.1 Of interest here is the comorbidity of common psychiatric disorders and impaired cognition.

Psychiatric disorders such as anxiety and depression (poor mental health) are associated with cognitive deficits,2 with higher baseline levels of depressive symptoms predicting a steeper decline in delayed recall and global cognition at follow-up,3 and longitudinal slowing in processing speed.4 Furthermore, at least one in four older adults experience a gradual decline in affective state5 with increasing age, suggesting that along with age-related cognitive decline, there could be a comorbid relationship of unclear temporality.6 The challenge is to measure this relationship in the most effective way. A cognitive measure which is associated with both longitudinal cognitive decline and poor mental health is reaction time intraindividual variability (RT IIV).2 7 RT IIV can be defined as trial-to-trial within-person variability in reaction time across individual trials within a single task for an individual. It is indicative of neurobiological disturbance, an increase of which is associated older age, mild cognitive impairment, dementia and mortality.8

Machine learning (ML) approaches have certain advantages in identifying patterns of information useful for the prediction of an outcome, even when complex high-dimensional interactions exist.9 Unlike hypothesis-driven models, ML approaches are refrained from assumptions on data distributions, especially one of normally distributed independence of attributes or linear data behaviour which are often failed to be met by large registries incorporating biological data. In addition, ML approaches have shown certain advantages in examining potential predictors simultaneously in an unbiased manner. Although ML methods have been used successfully to develop risk-prediction schemes in other areas of medicine, applications to psychiatric comorbid disorders have so far relied on small samples and thin predictor sets, failing to realise the full potential of the methods. With this in consideration, this study applies an ML technique on UK Biobank data to longitudinally examine whether anxiety/depression is important predictors of cognitive change.

The UK Biobank is a large population-based prospective cohort study of 502 664 participants. Invitations to participate in the UK Biobank study were sent to 9.2 million community-dwelling persons in the UK who were registered with the UK National Health Service aged between 37 and 73 years. A response rate of 5.5% was recorded. Assessments took place at 22 centres across the UK where participants completed undertook comprehensive mental health, cognitive, lifestyle, biomedical and physical assessments. Mental health and cognitive assessments were completed on a touchscreen computer.10 Here, we use an ML approach to investigate whether the presence of anxiety and depressive symptoms in UK Biobank leads to greater cognitive change measured from baseline to follow-up.


To examine whether anxiety and/or depression are/is important longitudinal predictors of cognitive change.



All analyses were conducted using data from UK Biobank application 15008.10

Population sample

The study considered for inclusion, the baseline sample of 502 664 UK Biobank participants aged between 37 and 73 years. For the analysis, participants with mental health and cognitive data at three time periods (recruitment, 5-year follow-up and the first imaging substudy) were used. For the purposes of this analysis, the imaging substudy data represent approximately a 10-year follow-up period from recruitment. Participants who chose to respond, ‘I don’t know’ or ‘I do not wish to answer’ to items within either of the scales (depression and neuroticism) items at the three time periods, were excluded, respectively (leaving 373 210 participants at baseline, 129 468 at first assessment and 3501 at imaging substudy). Casewise deletion was applied to all demographic variables (2326 were excluded). All analyses were conducted listwise and the number of participants with no missing data across all measurement variables of interest was n=1175.


Details of the UK Biobank assessment procedures may be found elsewhere.10 Cognitive performance was assessed using a ‘stop–go’ task where RT IIV was used as a sensitive measure of cognitive change with greater RT IIV over time indicating decline in performance.7 Mental health was assessed using two items from the Patient Health Questionnaire nine-item scale (PHQ-9)11 to measure depression (PHQ-2: ‘Over the last 2 weeks, how often have you been bothered by little interest or pleasure in doing things?’; ‘Over the last 2 weeks, how often have you been bothered by feeling down, depressed, or hopeless?’)12 and the 12-item Eysenck Personality Questionnaire-Revised Neuroticism scale as a proxy measure of anxiety.13 14 The other measures relevant to this analysis included self-reported cardiovascular disease (CVD: including heart attack, angina and stroke) and diabetes, and observed body mass index (BMI) as indicators of wider comorbidity.15 16 Self-reported smoking, alcohol consumption and fruit and vegetable consumption were used as indicators of lifestyle. Self-reported education, employment, ethnicity and household income were used as socioeconomic indicators.

Analytic strategy

Data were modelled as follows. Both age and RT IIV were modelled as continuous variables. The RT IIV was calculated as the SD of each participant’s RTs over the test trials.17 Participants with only one valid score at baseline or follow-up were omitted. The IIVs showed log–normal distributions and were natural log transformed. Depressed mood items were modelled as categorical variables (Not at all as 1, Several days as 2, More than half the days as 3 and Nearly every day as 4), and anxiety items were modelled as binary variables (yes/no). CVD and diabetes were modelled as binary indicators (present/absent). Gender was modelled as binary indicator (male/female). BMI was treated as a three-level factor (normal (<25), overweight (25–29.9), obese (>29.9)). Lifestyle data were largely considered as binary indicators (smoker/non-smoker, <5 portions of fruit and vegetable per day/5 or more portions per day); however, alcohol was treated as a continuous variable (g/day). Socioeconomic data were considered as binary indicators (employed/not employed, college degree/no degree, white ethnicity/other) excepting income which was modelled as a five-level factor (<£18 000, £18 000–30 999, £31 000–51 999; £52 000–100 000, >£100 000).

ML model

A recurrent neural network (RNN) is a class of artificial neural networks that is often used as an effective prediction tool for longitudinal biomedical data. By exhibiting temporal dynamic behaviour for time sequences, it allows for temporal dependencies between measurements. As a sequence learning-based method, it is able to offer non-parametric joint modelling of longitudinal data.18 19 Long short-term memory (LSTM) is an RNN architecture developed to deal with the exploding and vanishing gradient problem during backpropagation through time. It is able to effectively capture long-term temporal dependencies by employing a constant error carousel (memory cell) which maintains its state over arbitrary time intervals, and three nonlinear gating units (input, forget and output) which regulate the information flow into and out of the cell.18 20–23 This architecture is applied to longitudinally model the prediction abilities of psychiatric comorbidity disorders on cognition via sequence-to-sequence learning.

The dataset was partitioned into two non-overlapping subsets: 75% of the within-class subjects for training and 25% for testing. Data were rescaled (0 and 1) to meet the default hyperbolic tangent activation function of the LSTM model. The Adam version of stochastic gradient descent algorithm was applied to tune the weights of the network. This optimisation algorithm combines the advantages of Adaptive Gradient Algorithm and Root Mean Square Propagation, thus being effective in handling sparse gradients on non-stationary problems. Furthermore, it compares favourably to other stochastic optimisation methods in practice.24 Hyperparameters are optimised on the training set, for example, the model was set to fit 1000 training epochs with a batch size set to the number of available training subjects. Mean absolute error and area under the receiver operating characteristic (ROC) curve (AUC) were used to assess model performance, respectively. Permutation tests were applied to obtain p values of AUC. To evaluate whether depression and anxiety are important comorbid disorders of cognition and to compare the prediction abilities of diabetes and CVD, the model was run systematically to include the following as features:

  1. Covariates only as reference model (age; BMI; lifestyle factors; gender; socioeconomic factors).

  2. Neuroticism scale +reference model.

  3. Depression scale +reference model.

  4. Diabetes and CVD +reference model.

Exposure (outcome) was cognition (RT IIV). Given that depression mood disorder and anxiety have previously been considered to be comorbid disorders of impaired cognition, we expected predictions based on comorbid mental health disorders to outperform predictions based only on covariate variables.

A complete data analysis approach was used and all analyses were conducted on the Dementias Platform UK Data Portal25 in Python V.3.6.7 and MATLAB R2019.


Complete data were available for 1175 participants (table 1). The sample was aged between 40 and 70 years at baseline, 41% women and relatively healthy with few participants showing comorbidities and few reporting unhealthy behaviour such as smoking (5%).

Table 1

Descriptive statistics

The outcome of the RNN analysis found that anxiety and depression were stronger predictors of change IIV over time than either CVD and diabetes or the covariate variables, as measured by the ROC curve (figure 1 and table 2). The AUC curve was 0.68 (p<0.0001) for anxiety, followed by the depression model with an AUC of 0.63 (p<0.0001). The cardiovascular and diabetes model and the demographics model had weaker performance with an AUC of 0.60 (p=0.0036) and 0.56 (p=0.1126), respectively. The anxiety model prospectively identified 63.24% of patients who eventually reached high IIV (ie, sensitivity), and 53.35% of patients who maintained low IIV (ie, specificity). Correspondingly, the anxiety model had a positive predictive value of 52.72%, and a negative predictive value of 63.68%. The depression model achieved slightly smaller yet still significant (p<0.0001) predictive performance in comparison with the other models. The RNNs results suggest that the anxiety and depression model outperformed the reference model (model I—covariates only). In general, the neuroticism and depression scales helped improve the accuracy in capturing the pattern of IIV, and therefore may be more important predictors of cognitive change over CVD and diabetes. The results also suggest increased risk of multiple mental health disorders occurring together, suggesting that depression and anxiety are potential comorbid disorders of cognition.

Figure 1

Receiver operating characteristic (ROC) for recurrent neural network models I–IV.

Table 2

Model performance for comorbid longitudinal predictors of IIV


A longitudinal analysis of the comorbid predictors of mental health disorders on cognitive change was conducted using ML in a relatively healthy population sample. Evidence was found suggesting that anxiety and depression are important comorbid disorders of cognition but when CVD, diabetes and other covariate predictors were included in the model these were not as important on long-term effect. This suggests that poor mental health has a significant deleterious effect on long-term cognition, and may be considered an important comorbid disorder of cognitive decline. The contribution of the present investigation is thereby threefold.

First, this study used the community-based sample from UK Biobank with a general population large enough for identification of valid effects which might be implied across clinical and research settings. Clinical-based samples have been criticised for inconsistency in diagnoses when patients are treated over time in a variety of facilities or at different times in the same facility, or for failure to report all the comorbid diagnoses of a patient.26 In comparison, community-based samples are based on a system in which a practical number of criteria are rated, or collected and assessed on every patient, providing unbiased prevalence or incidence rates of comorbidity, or unbiased estimates of risk factors for comorbidity.26

Second, this study investigates the psychiatric comorbid disorders of cognition in a comprehensive ML design which was applied to examine if psychiatric disorders as measured by self-report scales are important predictors of impaired cognition. Unlike hypothesis-driven statistical techniques, the sequence learning-based method is able to offer non-parametric joint modelling, allow for multiplicity of factors and provide prediction models that are more robust and accurate for longitudinal data.18

Third, this study is conducted longitudinally. Disease progression modelling using longitudinal data may provide clinicians with improved tools for diagnosis and monitoring of diseases. We apply the widely used LSTM network to capture the long-term temporal dependencies among measurements without making parametric assumptions about cognitive trajectories. To our knowledge, there are no studies which have employed the ML techniques presented here to investigate the comorbidity of mental health disorders and longitudinal cognitive decline using CVD and diabetes as comorbidity covariates. Using ML methodologies for clinical data such as psychiatric outcomes may provide additional information which is not possible with traditional statistical methodologies. In future research, approaches used here may include imputation on missing data in other longitudinal cohorts. For instance, by preprocessing data interpolation using mean or forward imputation, or using possible correlations between missing values’ patterns and the target.27

There are general limitations to our work which should be noted. Participants in UK Biobank are selected from a general healthy population and outcomes may not reflect implications for those with clinically diagnosed psychiatric disorders of anxiety and depression. Here, also, measures of psychological state at point of assessment have been used to measure poor mental health (anxiety and depression). A further limitation is that only one cognitive measure has been used but findings would benefit from application across measures of wider cognition, for example, memory and executive function.

Clinical implications

The implication of this work is that the important predictive effect of mental health on longitudinal cognition should be noted,28 and its comorbid relationship with other conditions such as CVD likewise to be considered in primary care and other clinical settings.

What is already known about this subject?

  • Poor mental health is associated with cognitive deficits.

  • One in four older adults experience a decline in affective state with increasing age.

  • ML approaches have certain advantages in identifying patterns of information useful for the prediction of an outcome.

What are the new findings?

  • Psychiatric disorders are important comorbid disorders of long-term cognitive change.

  • Machine-learning methods such as sequence learning based methods are able to offer non-parametric joint modelling, allow for multiplicity of factors and provide prediction models that are more robust and accurate for longitudinal data

  • The outcome of the RNN analysis found that anxiety and depression were stronger predictors of change IIV over time than either cardiovascular disease and diabetes or the covariate variables.

How might it impact on clinical practice in the foreseeable future?

  • The important predictive effect of mental health on longitudinal cognition should be noted and, its comorbidity relationship with other conditions such as cardiovascular disease likewise to be considered in primary care and other clinical settings


View Abstract


  • Twitter @S_Bauermeister

  • Contributors CL, JEG and SDB conceptualised the idea. CL analysed the data and wrote the first draft of the manuscript. All authors interpreted the data. DG and SB commented, edited and proofread the manuscript. All authors read and approved the final manuscript.

  • Funding This is a DPUK supported project, all analyses were conducted on the DPUK Data Portal, constituting part 1 of DPUK Application 0132. The Medical Research Council supports DPUK through grant MR/L023784/2. CL was funded by DPUK to complete this analysis.

  • Competing interests No, there are no competing interests.

  • Patient consent for publication Not required.

  • Ethics approval Ethical approval was granted to Biobank from the Research Ethics Committee (REC) reference 11/NW/0382.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data may be obtained from a third party and are not publicly available. Data were uploaded onto the DPUK Data Portal All analyses were conducted on the Data Portal. Access to the data may occur through UK Biobank but all scripts are available to view on the Data Portal. UK Biobank application 15008 was used for this paper.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.