It is easy to forget how much of an impact the smartphone has had on our daily lives and our society. About 91% of Americans own a smartphone, which means most of us are walking around with a device in our pockets that is a computer, communicator, camera, GPS locator and has access to virtually the world’s store of human knowledge.
The ubiquity and power of these devices also creates the opportunity to gather a massive amount of data, which (from a scientific point of view) is a double-edged sword. It creates the opportunity to do some interesting and powerful scientific studies, but it also makes it easy to do a lot of bad science.
I had this in mind when I was recently asked to look into an app called Visible, which purports to help users track symptoms of long COVID and myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). The way the app works is that you take an assessment in the morning, which involved placing your finger on the camera so that it can record your heart rate and calculate your heart rate variability. Then at the end of the day you record your symptoms. The app then trains on your data to learn how to predict end of day crashes from morning biometric data.
These kind of claims send up red flags for me. Heart rate and heart rate variability are very noisy sets of data. You have to pull some fancy statistical analysis to pull trends or correlations out of such data. That does not make it wrong, but it means there are tons of opportunity for various types of P-hacking.
I am also a bit concerned about the target population – long COVID and ME/CFS. These are difficult to diagnose, partly because the symptoms tend to be nonspecific. Anxiety and depression likely play a significant role in these conditions, and inviting people to obsess daily about their symptoms may have unintended consequences.
As an aside, the app is initially free, with an option to allow the company to track your data. The saying is – if you are not paying for an app then, in some way, you are the product. For these kinds of apps what that can mean is that the company needs lots of data to train their models, and so early adopters of the free app are the training data. This can also mean that the early app (whether or not it is called a beta) likely hasn’t been fully trained or validated. With Visible, there were no published (even internal) studies to validate its claims.
In March of 2026, however, they did publish a study of the app. I was not impressed, but let’s go over the details. Subjects were asked to perform a daily routine – morning biometric measures, go about your normal day, evening recording of symptoms. The biometrics were heart rate and heart rate variability, using the phone’s camera. This is interesting technology – using a technique called photoplethysmography (PPG). The camera can see the changes in the blood in the finger as the heart beats. But this is also highly dependent on factors such as the pressure of the finger and movements. Subjects could also wear and LED armband using the same PPG technology.
The study compared three groups – covariates only, covariates + prior day’s symptoms, covariates + prior day’s symptoms + biometrics. Covariates are simply the age and sex of the user, time of day of biometric measurement, and the sensor modality, and this group was simply to make sure these incidental variables did not bias the outcome.
The main finding of the study was that prior day’s symptoms was a strong predictor of the current day’s symptoms. Here are the “area under the curve (AUC)” measures for the primary outcomes (with 0 being no predictive value and 1 being perfect prediction):
Crash
Symptoms-only: AUC = 0.78
Symptoms + biometrics: AUC = 0.81
Fatigue
Symptoms-only: AUC = 0.73
Symptoms + biometrics: AUC = 0.74
Brain fog
Symptoms-only: AUC = 0.83
Symptoms + biometrics: AUC = 0.85
As you can see, the prior day’s symptoms does all the heavy lifting here. There was only a tiny improvement when the biometric data was added. This was highly statistically significant, because they have tons of data, over 500,000 observations. This allows for tiny effects to be significant, even artifactual effects, so you have to interpret significant findings carefully when dealing with such large data sets.
They did not publish the biometric data alone, which I don’t think is an accidental oversight. From the data above it was likely unimpressive.
To me what this study shows is that the app is essentially worthless. Did you crash yesterday? Then you are about 80% likely to crash today. Were you fatigued at the end of the day yesterday? The you have a three-fourths chance of being fatigued today. You don’t need an app to know how you felt the day before, and adding the biometrics was clinically insignificant.
The study has other weaknesses. The symptoms were all subjective reports, and the biometric accuracy was not confirmed with any independent validation, such as comparing to ECG measurement. The authors note that this was not necessary because the data is all being compared within each user, not between users. That only applies to the predictive power of the app, which again, was not meaningfully better than just remembering how you felt the day before.
I would also like to see, however, some objective data on the net effect of using the app on some meaningful outcome. Does using the app help people manage their lives better in some way, or reduce anxiety, or improve sleep or some other outcome? It’s quite possible that using the app can make certain outcomes worse, by increasing focus on one’s symptoms.
While I think using smartphone apps for health monitoring has great potential, there is also tremendous potential for these kinds of apps which are doing something, but provide dubious net benefit. It is easy to dazzle by gathering lots of data, measuring things, and then spitting out lots of information. But is that information useful? Is it just distracting or anxiety-provoking? Does it divert from more useful methods? Does it facilitate or interfere with actual medical care?
Studies like also make it easy to create the impression of utility where there is little to none. The authors concluded:
“Walk-forward cross-validation showed a statistically significant improvement in model performance when morning biometrics were added to prior-day symptom reports (AUC = 0.82–0.85 vs. 0.73–0.83). These findings represent the prospective utility of mobile health tools for precision monitoring and prediction of real-time symptom exacerbations in complex chronic illness.”
Use of the term “precision monitoring” is interesting. Precise does not mean accurate, and we don’t know how accurate the data is because we don’t have a gold-standard comparison. Precision also does not equate to useful – you can precisely measure something which is of limited or no value.
My conclusion – this app is, for all practical purposes, useless, and we have no idea what its net effect is.
