Artificially intelligent social risk adjustment

By | December 10, 2021

What accounts for large differences in life expectancy from one neighborhood to another? This post explains what our team has discovered so far using an “artificially intelligent” approach to understanding social risk at the local level.

Where you live affects how long you live

In 2018, when the National Center for Health Statistics released the first-ever national dataset with small-area life expectancy estimates, it made headlines all over the US, such as:

What your neighborhood says about your life expectancyMarketplace, Nov. 7, 2018

And the project’s data are still generating headlines, now informed by the current pandemic-associated disparities:

Life expectancy dramatically lower in Houston’s Black, brown communities, study says – ABC 13, Nov. 12, 2020

Many experts have thought about why people in some neighborhoods have shorter life expectancies. Perhaps the food deserts, the crime, the persistent, concentrated poverty, and the lack of sidewalks or parks where people can safely exercise play a role. Lots of factors may contribute to a neighborhood’s life expectancy.

Unfortunately, existing measures of social determinants (or social risk factors) only include a small set of factors and may miss some important characteristics. For example:

  • The Area Deprivation Index (ADI) includes 17 measures in 4 categories:
    • income
    • education
    • employment
    • housing quality
  • The Social Deprivation Index (SDI) is based on 7 variables collected in the American Community Survey:
    • poverty rate
    • adults without a high school diploma
    • single-parent households
    • living in rented housing unit
    • living in overcrowded housing unit
    • households without a car
    • unemployment
  • The Social Vulnerability Index (SVI) incorporates 15 Census variables, including poverty, lack of access to transportation, and crowded housing, within 4 domains:
    • socioeconomic status
    • household composition
    • race/ethnicity/language
    • housing/transportation

Health and medical practitioners and researchers have called for a better way to measure, predict, and adjust for social risk factors (SRFs) in healthcare and population health.

Payment reform, meet social determinants

At the same time as we’re learning just how much our local area affects our lifespan, the healthcare sector in the US is undergoing a revolution. As readers of this blog are likely aware, we spend far more than other countries on healthcare but have worse health outcomes.

Payers are starting to want more value for their spending. This includes both private payers, like Blue Cross-Blue Shield, and public payers, like Medicare. With economists having provided ample evidence of misaligned incentives in healthcare over the past decades, payment reform is finally having its moment in the sun.

Moving away from paying for every service and encouraging more services, payers are now basing reimbursements on episodes of care for specific patients or care for a year for an entire population. For instance, an integrated network of hospitals and doctors might sign up to get paid a lump sum for all the care their Medicare patients need in a year. In this transition toward paying for value is risk that providers won’t be adequately reimbursed for caring for more needy patients, and may instead try to avoid them.

Inadequate risk adjustment?

The formulas we rely on to determine the adequacy of payments don’t usually adjust for all the things that influence outcomes. Partially, this may be because no one has the data needed to measure certain things. For example, while the ICD-10-CM contains Z-codes to note social risks, providers rarely use them. Providers have little incentive to collect such information. If the Z-codes are not determining payment, why bother?

Risk adjustment models also focus on total costs of care. Using spending as the outcome ignores existing issues with access to care that disproportionately affect racial and ethnic minorities.

Inadequate adjustment for social risk factors can lead to several unintended consequences:

But what about ZIP codes?

Lacking person-level information, can ZIP-code-level information be informative? While person-level data is best, we have ample evidence of the population-level ties between neighborhood characteristics and health outcomes going back to at least the 1980s. We are standing on the shoulders of giants such as Nancy Krieger and Dolores Acevedo-Garcia in approaching our analyses. None of this is new. What is new is the ability to get more granular data into our models.

Enter the machines

Data science seems to be everywhere these days. With computing power getting better and better every day, we now have access to approaches and methods that our predecessors could only dream of. Sometimes, though, machine learning has gotten a bad rap. People say that artificial intelligence algorithms are a “black box”—not transparent enough. And indeed, some data scientists have been blind to their biases and made things worse instead of better.

Conceptual model includes 10 domains: healthcare access, coverage, costs, and quality; educational attainment and quality; community health, wellbeing, and healthy behaviors; bias, stress, and trauma; justice, crime, and incarceration; food security and access to healthy food; poverty, inequality, and employment; housing adequacy, crowding, and structural health; environmental quality; and transportation access, infrastructure, and safetyWith this project, we tried to address potential biases in several ways:

  • We worked from a conceptual model of social determinants of health that guided which risk factors to include. Our conceptual model builds on the CDC’s Healthy People 2020 framework for social determinants (see Figure).
  • We explicitly called out bias, stress, and trauma by including measures of racial segregation and inequality in the model.
  • Our algorithm predicts Census tract (CT)-level life expectancy, rather than spending.
  • We involved people in the development process from different backgrounds and disciplines. Our team includes people with direct experiences with the social risks (such as food insecurity and poverty) we measure.

With the idea that more granular data could give us better predictions, we developed an approach designed to capture as much as we could. However, we thought carefully about each variable we included. Our conceptual framework guided us through the process.

To arrive at our social risk measure, we used a random forests approach to predict neighborhood life expectancy (using CTs, which were designed to be relatively homogeneous areas with an average of about 4,000 people, as the unit of analysis). Random forests have several big advantages over traditional regression-based methods. One is that you can include hundreds of variables. Our current model has 147 (carefully selected) variables! We have no convergence issues or problems with outliers, but we might with traditional regression.

A sneak peek at our results

We can’t share our full results just yet, and this isn’t the venue for that anyway. But we do want to share a few things we’ve learned in our pilot phase, which used data in just one state: Ohio.

Ohio has a statewide average life expectancy of 76.6 years. But there’s a 29-year gap between the Census tract (CT) with the shortest life expectancy (60 years) and the longest (89.2 years). How much could social risk factors explain that gap?

For a benchmark, we first looked at how well the three existing measures, described above, could explain the gap. We found that in Ohio:

  • The SVI explains 50% of the variance in life expectancy at the CT level
  • The SDI explains 58%
  • The ADI explains 63%

So how well did we do? Our measure explains 73% of the variance in life expectancy at the CT level. This means that publicly available data can explain nearly three-fourths of the disparity between the Franklinton neighborhood in Columbus and the suburban Stow area of Akron.

The top 10 most important predictors of the life expectancy gap

Here are the top 10 variables that explain the life expectancy gap in Ohio (from our set of 147 variables across 10 domains):*Measures from the Opportunity Atlas

Closing thoughts

As data geeks, we are excited to be working on something that could be useful in many settings—from risk adjustment to evaluation to case management. Our local social risk scores will be used in conjunction with individual patient data to understand which factors are most important to address through policies and population-level interventions in specific neighborhoods. Understanding what risk factors matter most to health outcome improvement is a big deal.

We especially look forward to seeing widespread adoption of measures of social risk into payment formulas. Fair payment requires it. Better adjustment for social risk could be an important contributor to improved health equity for all.

Note: For more about this project, check out the Health Datapalooza and National Health Policy Conference session “Innovation and Analytics to Inform Policy and Health System Choices Rapid Fire Session: Social Determinants of Health,” on Feb. 16, 2021, 2:30-3:30pm ET.

Listen to this post

 

Lisa M. Lines

Lisa M. Lines

Senior health services researcher at RTI International
Lisa M. Lines, PhD, MPH is a senior health services researcher at RTI International, an independent, non-profit research institute. She is also an Assistant Professor in Population and Quantitative Health Sciences at the University of Massachusetts Chan Medical School. Her research focuses on social drivers of health, quality of care, care experiences, and health outcomes, particularly among people with chronic or serious illnesses. She is co-editor of TheMedicalCareBlog.com and serves on the Medical Care Editorial Board. She served as chair of the APHA Medical Care Section's Health Equity Committee from 2014 to 2023. Views expressed are the author's and do not necessarily reflect those of RTI or UMass Chan Medical School.
Lisa M. Lines
Lisa M. Lines

Latest posts by Lisa M. Lines (see all)