Back to blog

Ethical Data in Social Determinants of Health

In healthcare, data impacts the quality of care patients receive and the ability of many sectors of the health ecosystem to improve health outcomes. While sharing and integrating clinical data has been an area of intense focus over the last decade, social data has gotten less attention. 

Since SDOH factors can drive as much as 80 percent of health outcomes, using social data is essential to address the root causes of poor health outcomes. But just using the data isn’t enough; we must make sure we’re collecting and using social data ethically to avoid doing more harm than good.

The promise of social data is tremendous. Being able to tell if a patient is likely unable to access certain kinds of care because they can’t get to the appointment or pharmacy lets the care team focus on the right problems first. At the macro level, accurate population-level social data is essential to tackling disparities through policy change, funding, and advocacy. But collecting this information can go well beyond what health care providers normally collect. Our navigators screen the entire population we work with for social needs, giving us a much fuller picture of the challenges facing your patients.

But social data goes beyond just a social needs survey. Public records like title information or court records, zip code or block level geospatial data, and how often they’ve changed their address in the last year can all help give a fuller picture of a patient’s situation and how it impacts their access to healthcare. The range of public and commercial data available on nearly everyone in the US is staggering. Combining all of this data would leave many uncomfortable, and that’s before adding their healthcare records into the mix. It’s absolutely critical that anyone using social data has specific use cases in mind and doesn’t collect data just for the sake of getting a fuller picture.  

Once you have a list of the kind of data you need to make good decisions about a patient, sometimes structural barriers get in the way of accessing it. FHIR has been a huge advance in data sharing on the clinical side, but its support for social data is extremely limited. Additionally, coding support for social needs is relatively immature. There is less specificity in the Z codes and sub-codes than in other ICD-10 codes, which payers may consider for reimbursement and inclusion of these diagnostic codes into risk-adjusted payment models.

There are many ways to code a broken arm, but a wide variety of social needs may fall under the same Z code. Z59.4 is “Lack of adequate food,” and Z59.82 is “Transportation insecurity,” but those are much less specific than S42.301A, “Unspecified fracture of shaft of humerus, right arm, initial encounter for closed fracture.” However, data interchange and coding both presupposed that clinicians are asking about social needs. While CMS data shows that some progress is being made year over year for use of Z-codes, a recent study suggests that providers may feel limited in what they can do or need more guidance on how to best assist their patients for non-medical needs. Absent a program like Path Assist, payers who want social data to improve quality scores, utilization, health confidence, and access will struggle.

Once you do have social data, it’s important to handle it appropriately. While social data is not healthcare data, if you’ve collected it on people you have healthcare data on, you need to treat it with the same care and concern as you do with HIPAA. However, because the scope of information can be significantly larger, the risk of inappropriate access or breach is similarly increased. One way we’re managing that risk here at Activate Care is to limit the use of identified data for analytics and reporting purposes.  

For many operational use cases, we need to use identified data, but there are very few use cases where we want to share the full spectrum of social data. If we’re able to summarize that data or compute scores that we can present to our clinical team, we’re able to derive nearly all of the benefits of understanding e.g. transportation risks without sharing how many close family members have titles to cars. On the reporting side, replacing PII with cryptographic tokens and ensuring your data is statistically de-identified can yield a pseudonymized data set that is accurate at the population level but prevents linking back to an individual. These practices, combined with tight controls around how and by who data from each silo is accessed, audits, training, and keeping current with security best practices limit the risk of the collected data being used inappropriately. 

No discussion of the ethical use of data is complete without covering how data can become biased and how it can impact your analyses. When working with data from a number of sources, how the data is collected, transformed, cleaned, and structured can (and does) introduce bias.  It’s impossible for a downstream user to fix all of these problems, but some corrections are possible.  

Taking one specific example, consider the complexities around linking records between data sources (sometimes called mastering or matching). Women change their full names more often than men, so linking records is more difficult, leading to fewer complete records. People with common names may be over-linked, leading to their health looking much worse on average than people with less common names. People with uncommon names may be frequently misspelled leading to under-linked data. The way to solve these problems is to introduce other data points (like address) or make the matching more complex (“sounds like” algorithms for names) when performing your matching, but these, too, introduce bias. 

People who move more often will be less likely to match on address, or people who live together and share a name may over-match. Different cultural and racial groups may have different patterns of behavior that impact matching accuracy, and it is hard to impossible to have a completely unbiased matching process. Assigning staff to investigate a random subset of your matching decisions can give you an indication of what ways your matcher is biased which you can use to correct your analyses.

Other problems can come from legacy systems. Defaulting gender to male or not differentiating between sex and gender was a common pattern for software systems in years past, and not every data provider is running the most current software. Every initial analysis of new datasets should include quantifying bias in the data across important dimensions and then continuous monitoring for changes across those important dimensions. If you measure and alert on significant data anomalies, you can detect issues like a data provider only sending you women in a given delivery or that a dataset from a specific geography significantly under-represents a specific race or ethnicity. 

As we collectively work towards better outcomes for the communities we serve, data is key, and how we collect and use this data determines our success. Data must be as representative and unbiased as we can make it to ensure that we can benefit everyone, especially those who have historically had unequal access. Most importantly, we must protect the privacy and earned trust of individuals so that everyone has a chance at better health.