Article Text

Download PDFPDF
Natural language processing for structuring clinical text data on depression using UK-CRIS
  1. Nemanja Vaci1,
  2. Qiang Liu1,
  3. Andrey Kormilitzin1,
  4. Franco De Crescenzo1,2,
  5. Ayse Kurtulmus1,3,
  6. Jade Harvey2,
  7. Bessie O'Dell1,
  8. Simeon Innocent1,
  9. Anneka Tomlinson1,
  10. Andrea Cipriani1,2,
  11. Alejo Nevado-Holgado1,4,5
  1. 1 Department of Psychiatry, University of Oxford, Oxford, Oxfordshire, UK
  2. 2 Research and Development, Oxford Health NHS Foundation Trust, Oxford, Oxfordshire, UK
  3. 3 Department of Psychiatry, Istanbul Medeniyet University Goztepe Research and Training Hospital, Istanbul, Turkey
  4. 4 Big Data Institute, University of Oxford, Oxford, United Kingdom
  5. 5 Artificial intelligence, Akrivia Health, Oxford, United Kingdom
  1. Correspondence to Dr Nemanja Vaci, Department of Psychiatry, University of Oxford, Oxford, Oxfordshire OX3 7JX, UK; nemanja.vaci{at}


Background Utilisation of routinely collected electronic health records from secondary care offers unprecedented possibilities for medical science research but can also present difficulties. One key issue is that medical information is presented as free-form text and, therefore, requires time commitment from clinicians to manually extract salient information. Natural language processing (NLP) methods can be used to automatically extract clinically relevant information.

Objective Our aim is to use natural language processing (NLP) to capture real-world data on individuals with depression from the Clinical Record Interactive Search (CRIS) clinical text to foster the use of electronic healthcare data in mental health research.

Methods We used a combination of methods to extract salient information from electronic health records. First, clinical experts define the information of interest and subsequently build the training and testing corpora for statistical models. Second, we built and fine-tuned the statistical models using active learning procedures.

Findings Results show a high degree of accuracy in the extraction of drug-related information. Contrastingly, a much lower degree of accuracy is demonstrated in relation to auxiliary variables. In combination with state-of-the-art active learning paradigms, the performance of the model increases considerably.

Conclusions This study illustrates the feasibility of using the natural language processing models and proposes a research pipeline to be used for accurately extracting information from electronic health records.

Clinical implications Real-world, individual patient data are an invaluable source of information, which can be used to better personalise treatment.

  • adult psychiatry

Statistics from


  • Twitter @And_Cipriani

  • Contributors NV, QL, AK and AN-H: designed the study and analysed the data. AyK, JH, SI: prepared input data. BO, FDC, AT, AC: advised on the scope of clinical variables in the study. NV, BO, FDC, AT, AC and AN-H: wrote the manuscript. All authors edited the manuscript.

  • Funding This project was funded by the MRC Pathfinder Grant (MC-PC-17215), by the National Institute for Health Research (NIHR) Oxford Health Biomedical Research Centre (BRC-1215-20005) and EPSRC-NIHR HTC Partnership Award 'Plus': NewMind - Partnership with the MindTech HTC (EP/N026977/1). This work was supported by the UK Clinical Record Interactive Search (UK-CRIS) system data and systems of the NIHR Oxford Health Biomedical Research Centre (BRC-1215-20005). AC is supported by the NIHR Oxford Cognitive Health Clinical Research Facility, by an NIHR Research Professorship (grant RP-2017-08-ST2-006) and by the NIHR Oxford Health Biomedical Research Centre (grant BRC-1215-20005).

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data may be obtained from a third party and are not publicly available. The electronic health records record patient identifiable information and therefore cannot be shared publicly. The data can be used and re-used by applying through UK-CRIS Oxford NHS Trust (

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.