Background While there are numerous mental health apps on the market today, less is known about their safety and quality. This study aims to offer a longitudinal perspective on the nature of high visibility apps for common mental health and physical health conditions.
Methods In July 2019, we selected the 10 top search-returned apps in the Apple App Store and Android Google Play Store using six keyword terms: depression, anxiety, schizophrenia, addiction, high blood pressure and diabetes. Each app was downloaded by two authors and reviewed by a clinician, and the app was coded for features, functionality, claims, app store properties, and other properties.
Results Compared with 1 year prior, there were few statistically significant changes in app privacy policies, evidence and features. However, there was a high rate of turnover with only 34 (57%) of the apps from the Apple’s App Store and 28 (47%) from the Google Play Store remaining in the 2019 top 10 search compared with the 2018 search.
Discussion Although there was a high turnover of top search-returned apps between 2018 and 2019, we found that there were few significant changes in features, privacy, medical claims and other properties. This suggests that, although the highly visible and available apps are changing, there were no significant improvements in app quality or safety.
- depression & mood disorders
- schizophrenia & psychotic disorders
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
The rapid growth in availability of mental health apps presents a plethora of easily accessible tools directly to patients and clinicians. In 2018, there were reported to be over 300 000 mobile health apps, at least 10 000 of which were related to mental health.1 Despite research that has characterised most of these apps to be of questionable quality,2 there is a lack of data on longitudinal trends and changes in the mental health app space because most studies to date have been cross-sectional. Previous research has quantified the rate of turnover of mental health apps on app stores,3 but changes in the apps themselves have not been examined. In the past year, health apps have come under increased scrutiny and attention from scientists and society alike, but have individual apps been updated in response to both this scientific and public pressure? As the public becomes more concerned about app privacy, new research about mental health apps is published and a greater focus on human factors and usability emerges, we assess whether such efforts are reflected in current app offerings.
Since the 2018 coding of apps, awareness about the risks of health apps and the need for further evidence has grown. In November 2018, for example, the New York Times highlighted how easily apps can capture and market people’s location via smartphone GPS.5 Furthermore, the Federal Trade Commission held national hearings and issued the largest penalties for violations around digital privacy and,6 in April 2019, WHO released guidelines for using digital tools like apps in patient care.7 A mental health advocacy group highlighted broad concerns on their online blog in October 2018, stating ‘who owns the data collected…’, ‘who has access to the data…’ and ‘how does the tech programme actually work’.8 While it is unreasonable to expect the entire health app landscape to dramatically improve in 1 year, as there is a lag between intent to change and actual change, the often-touted advantage of digital health tools is their ability to readily adapt and evolve to meet the needs of patients.
In this review, we aim to expand our team’s 2018 review4 and explore what features, protections, evidence, and markers of quality are present in top apps for depression, addiction, anxiety disorder, schizophrenia, hypertension, and diabetes. We hypothesise that in 2019 there will be improvements reflected in more apps having privacy policies and supporting evidence along with fewer being flagged as concerning. As with last year, we hypothesise that there will not be a simple relationship between these app features and attributes and overall app quality.
To offer an overall assessment of the app, we applied the same 3-point scale as the 2018 study as follows: 0 represented ‘serious concerns regarding safety’, 1 represented ‘likely acceptable app’ and 2 represented ‘a potentially more useful app’.4 Acknowledging this scale is itself subjective given that the utility of any app depends on the patient at hand, clinical needs and treatment goals, our analysis focused on apps that were rated with ‘serious concerns regarding safety’ because the clear safety concerns are far less subjective than other ratings. For example, an app that provided incorrect medical information would be scored as a 0 (a safety concern) in our ratings. Assessments of apps and scores from the 2018 study were then compared with those from the current 2019 study using t-tests.
The methods of assessing the relationship between app features and reviewer quality flags were similar to those used in the 2018 paper.4 Specifically, we used variable selection using the Lasso method to obtain models that relied on fewer app metrics. In applying regression to all metrics within each disease state, we applied a penalty using the number of ratings as weights with a ceiling of 1000 for apps with >1000 ratings. Tuning parameters were chosen by fivefold cross-validation and we repeated the process 100 times to account for our relatively small sample size.
We coded a total of 120 apps, with 20 for each condition (10 iOS and 10 Android). On both the Apple App Store and the Google Play store, three apps appeared in both the depression and anxiety searches (on Apple App Store: Moodpath: Depression & Anxiety, AntiStress Anxiety Relief Game and Pacifica for Stress & Anxiety; on the Google Play store: Moodpath: Depression & Anxiety, Youper—Emotional Health and Wysa: stress, depression & anxiety therapy chatbot). Compared with 1 year prior, the top 10 apps across each of the 6 conditions were largely different. Only 34 (57%) of the apps from the Apple App Store and 28 (47%) of the apps from the Google Play Store in our 2018 search were still in the top 10 search-returned apps in the 2019 search.
Compared with the apps identified in 2018, more apps made medical claims in every disease state except addiction, which decreased from one to none. However, the absolute number of apps now making medical claims (50%) was not statistically significantly higher than the 2018 claims data (30.8%). Likewise, the number of apps offering privacy policies changed across all conditions: in 2018, 70% of the apps coded contained privacy policies compared with 87.5% in 2019, a change that was not statistically significant. While we did not evaluate the privacy policies themselves, we did assess for ability to delete data, which actually decreased for depression, anxiety and diabetes apps. Other results are shown in table 1.
We found overall few changes in app features for information/data collection and interventions provided in 2019 compared with 2018. Pop-up messages offering information or returning summarised/analysed data (such as average steps taken per day) remain the most common intervention mode. Looking at disease-specific apps, there were few significant changes in the 12-month period covered by this study. More apps included privacy policies, and fewer apps offered the ability to delete data, both changes which were insignificant. Apps associated with schizophrenia had the least number of features, the lowest star ratings in the stores and the highest number of days since last update.
Using Lasso regression, we replicated the prior study’s finding that apps which had not been updated in over 180 days were associated with our rating for serious concerns regarding safety (two-sided t-test, p<0.01). Results did not change when we weighted apps to account for number of reviews. The schizophrenia apps which had not been updated for a mean of 514 days had a mean flag value of 0.55 flag, and the hypertension apps that were not updated for a mean of 321 days a mean flag value 0.75. We did not find any other clear association between individual app metrics recorded and quality, a finding also in line with the results from the 2018 study.
Our review found a high degree of turnover for top search-returned apps across diabetes, hypertension, depression, anxiety, addiction and schizophrenia but overall little evidence for change in their privacy, safety, features and functions. The rate of stability of apps from the top 10 (57% and 47% of iOS and Android apps, respectively, remaining after 12 months) appears higher than previously reported (95.8% and 82.4% of the top 25 iOS and Android apps, respectively, remaining after 9 months in 2015).3 This may indicate that turnover is more frequent than previously reported among the top search-returned apps, or that the app store marketplace is now even more dynamic and volatile than it was 4 years ago.
While public debate on digital privacy and new research on evidence-based interventions evolves, it appears that those apps most accessible to consumers are not evolving as quickly. Our finding that there was not a clear association between any measure except for time since last update >180 days with app quality suggests that there is no simple formula to assess the clinical safety and potential of these digital tools.
Our results also highlight the divide between the potential of apps and their current offerings available to the public. As shown in table 2, there were few significant changes between 2018 and 2019 in the types of data these top apps collected or the means they used to return data or offer interventions to users. The majority of top apps still capture data via surveys or diaries and return that data via popup messages in a summarised format or with a disease-related fact. This model of use fails to take advantage of novel means to capture clinical state like digital phenotyping or smartphone features,11 which would enable greater understanding of the context and environments surrounding the person to deliver individualised care. For some diseases like schizophrenia, top apps continue to offer mainly reference information, and much of that is out of date and of concerning quality.
Apps for physical health conditions (diabetes and hypertension) showed little observable difference in quality measures from mental health apps. Overall, more apps for diabetes and hypertension included medical claims than in mental health apps (65% vs 42.5%); however, this difference is largely due to a lack of medical claims in the addiction apps. Physical health apps also more commonly included step counters as well as integrations with other health data and smart devices. Given the importance of physical activity for both cardiovascular and mental health, this could reflect an opportunity for improved tracking of physical activity in the context of mental health apps.
Just as in the 2018 study, we did not identify a strong association between coded app attributes and our quality flag. Acknowledging that our quality flag metric is itself subjective, we did observe decreases in anxiety and addiction apps that were significant (table 2) and an increase in the mean flag rating of schizophrenia apps between 2018 and 2019. In attempting to build models to predict app quality from features and attributes in tables 1 and 2, we only sought to predict low-quality flag metrics (<1), as it is easier to determine what is a poor-quality app than a high-quality app given the numerous use cases and personal preferences related to app engagement. Our result that apps not updated in over 180 days are likely of poor quality is helpful in screening out concerning apps, but does not offer a simple formula or feature set with which to identify high-quality apps. These results call into question app rating and curation efforts which attempt to rank apps in part based on various app features and attributes—which may not be able to keep current with frequent app turnover,12 or account for the myriad of ways people use apps. Prior research on existing app evaluation systems has also questioned the validity of metrics used to calculate recommendations scores.13 Without well-validated metrics to guide app evaluation, we suggest that a more holistic and informed approach to picking the right app for the right patient may make sense today.
Our results are well aligned with other recent studies examining mobile app features. A recent study evaluating data security and privacy policies of mobile apps for depression found that most policies lacked information about the ability to edit and delete personal information14 and although there was an absence of information in privacy policies, other studies similarly showed there was an increase in the number of privacy policies for the apps evaluated a year later.15 Another recent study determined that 64% of mental health apps evaluated made claims of effectiveness,16 which corresponds to the growing number of apps that made medical claims in our study.
The digital health app space and scope continues to rapidly evolve,17–19 with many new apps appearing as others disappear. As a group, however, the quality, features and attributes of the top apps we examined does not appear to be changing as quickly. Ensuring that the current international efforts around digital health privacy and recent research findings are rapidly disseminated into available apps represents a challenge that the digital health field must now embrace if it is to fulfil its potential of offering safe and effective tools.
Contributors All authors contributed equally.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement No data are available. The goal of this project is not to call out any single app but rather note larger trends. It is possible to recreate these data for current apps with materials offered in the paper.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.