One of the challenges in delivering efficient medical care is identifying people who are at risk of a negative outcome, so we can focus our efforts on screening and treating those at elevated risk. We do this in individual face-to-face encounters through clinical, diagnostic processes: taking a patient’s history, performing a physical examination, recording signs and symptoms. Across populations, we do it by using data collected in these encounters over time to develop algorithms and predictive statistical models. For me personally, these risk stratification, prediction, and adjustment models are some of the most interesting tools used in health services research.
Over the past decade or more of transitioning to electronic health records (EHRs) in the US, one of the biggest promises for research has been the idea of using the rich, clinical detail available from EHRs to enhance the standard claims and administrative data we’ve traditionally used to build risk models. In fact, we were lucky enough to be able to construct a linked EHR-claims database for one of my dissertation papers (co-authored with my advisor, Arlene Ash, and another member of my committee), published this year, in which we predicted emergency department visits.
And that brings us to a new article, published in the August issue of Medical Care: Comparing Population-based Risk-stratification Model Performance Using Demographic, Diagnosis and Medication Data Extracted from Outpatient Electronic Health Records versus Administrative Claims.
In this paper, a team from Johns Hopkins (first author: Hadi Kharrazi, MD, PhD) evaluates the possibility of using EHR data in addition to (or instead of) administrative claims for risk stratification. They sought to predict two different outcomes: hospitalization (excluding childbirth-related stays) and being in the top 1% of costs. They studied a sample of 85,581 individuals (all under age 65), continuously enrolled in both 2011 and 2012, who visited a primary care clinic associated with HealthPartners (a Bloomington, MN integrated delivery network) at least once in at least 1 year of the study period. The authors used the Johns Hopkins Adjusted Clinical Group (ACG) system, which has been validated for risk stratification.
They noted that about 46% of diagnoses were listed only in claims, while about 7% were listed only in the EHR. The overlap between claims and EHR data, regarding reported chronic conditions, was about 58%. Combining EHR and claims data:
- increased identification of cancer by 12%
- increased identification of diabetes by 10%
- increased identification of hypertension by 3%
- increased identification of depression by 3%
Turning to the accuracy of the models, assessed using R2 and area under the receiver operating characteristic curve (AUC):
- When predicting cost, adding EHR to claims data actually lowered the R2 by a small amount.
- When predicting both cost and hospitalization, using both EHR and claims data slightly increased the AUCs across both outcomes for concurrent outcomes (this year’s outcomes predicted using this year’s data).
- The AUC did not change or decreased when predicting prospective outcomes (next year’s outcomes predicted using this year’s data).
The authors conclude that, while risk stratification using EHR data was feasible, it was less accurate than claims-based models in predicting hospitalization and high costs, although it increased the ability to identify some important conditions.
Their findings suggest that EHRs, at least in this study, contain more outdated and/or inaccurate data than do claims. Additionally, even in an integrated delivery network, the available claims data could have been incomplete (especially for mental health utilization). EHR-derived medication data also represents prescriptions, while claims represent actual medication fills. Mismatches could be related to nonadherence or to individuals paying out of pocket (for example, using online pharmacies) instead of paying for medications with their insurance.
As usual, more research is needed to understand how best to deliver on the promise of EHR data’s usefulness.