Background We report a study of machine learning applied to the phenotyping of psychiatric diagnosis for research recruitment in youth depression, conducted with 861 labelled electronic medical records (EMRs) documents. A model was built that could accurately identify individuals who were suitable candidates for a study on youth depression.
Objective Our objective was a model to identify individuals who meet inclusion criteria as well as unsuitable patients who would require exclusion.
Methods Our methods included applying a system that coded the EMR documents by removing personally identifying information, using two psychiatrists who labelled a set of EMR documents (from which the 861 came), using a brute force search and training a deep neural network for this task.
Findings According to a cross-validation evaluation, we describe a model that had a specificity of 97% and a sensitivity of 45% and a second model with a specificity of 53% and a sensitivity of 89%. We combined these two models into a third one (sensitivity 93.5%; specificity 68%; positive predictive value (precision) 77%) to generate a list of most suitable candidates in support of research recruitment.
Conclusion Our efforts are meant to demonstrate the potential for this type of approach for patient recruitment purposes but it should be noted that a larger sample size is required to build a truly reliable recommendation system.
Clinical implications Future efforts will employ alternate neural network algorithms available and other machine learning methods.
- neural network
- deep learning
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Funding University of Toronto McLaughlin Centre, grant number: MC 2014-18. This work was supported by a McLaughlin Accelerator Grant in Genomic Medicine (PW, JS).
Competing interests None declared
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.