Consent, data-driven inequities, and the risks of sharing administrative data

Last month, we wrote about why we are excited to be working on a new policy project on administrative data. Administrative data refers to records that government and social services keep on the people they serve: information collected for operational purposes.  Sharing and linking this data between government ministries, nonprofits, and academic researchers could promote new opportunities for conducting social research, providing collaborative care in the social service system, and supporting evidence-based policy—but is currently only happening to a limited extent in Canada. Our project involves co-creating a Canadian policy agenda around increasing social impact through administrative data-sharing.

Breaking down data silos will be no simple feat, especially because sharing this kind of person-level data raises critical questions around consent and privacy.  It also prompts a much needed conversation on how data-driven tools can serve to amplify societal inequities, posing greater dangers to communities at the margins. In this blog post, we will explore a few of the risks attached to administrative data-sharing. We will also emphasize the need for an inclusive process in designing a policy agenda around administrative data use.

When data is shared for purposes not consented to

Some alarming news stories have emerged about technology corporations collecting user data and putting it to use in nefarious ways—ways that users never consented to. An obvious example might be the Cambridge Analytica scandal, where data from millions of Facebook profiles were used to influence voter behaviour in the 2016 US election. Another might be the unsettling time Uber engineers used trip location and time data to determine which users had one-night stands.

Should we have similar concerns around consent when it comes to administrative data-sharing? After all, administrative data-sharing involves repurposing data originally collected by the government for operational needs in new ways. Just because a person consents to having their data collected to access a specific service—say, a health service, this does not mean they consent for that data to be shared with other institutions. What are the implications of using someone’s government records beyond purposes they originally consented to?

Here’s one example. In the UK, frontline outreach workers collect nationality, mental health, and gender data of the homeless for the Greater London Authority in order to help policy makers identify the needs of the homeless population. In 2017, it was discovered that Home Office immigration officials were secretly using this nationality data to identify the location of illegal immigrants sleeping on the streets and deport EU nationals.  

It’s clear that data in the wrong hands, or under a politically hostile administration, can pose threats to privacy—and in some cases, cause serious harm. In exploring a policy agenda around administrative data-sharing, there are critical questions to address concerning ownership, control, and consent around how our data is shared, collected, and used.

When data amplifies societal inequities

There is also a growing conversation on how the emergence of data-driven tools, such as facial recognition, predictive policing, even search engines like Google, reflect discriminatory biases of those that engineer them and societal inequities more broadly. For example, algorithmic risk assessment scores around recidivism, calculated from personal data, are used to help decide sentencing decisions in some US states. Despite being touted as “neutral”, these risk assessment scores are racially biased: over-predicting the risk of recidivism for black defendants, while under-predicting for white defendants.

The risk of exacerbating societal inequities through data policy and tools extends to government administrative-data sharing, too. In her book Automating Inequality (2018), welfare rights organizer Virginia Eubanks points out that governments typically hold greater amounts of information on marginalized communities as a virtue of their accessing public services at a higher rate. She writes: “Marginalized groups face higher levels of data collection when they access public benefits, walk through highly policed neighbourhoods, enter the healthcare system, or cross national borders. That data acts to reinforce their marginality when it is used to target them for suspicion and extra scrutiny.

Eubanks highlights how administrative data-sharing has already facilitated a new form of “automated inequality”. She points to the Allegheny Family Screening Tool (AFST) as a case study in how data-driven tools can further profile poor communities and communities of colour.  The AFST is a tool meant to help child welfare staff identify and prioritize the most “at risk” children in Allegheny County, Pennsylvania. The tool links data between twenty-nine different administrative data sources from the county’s Department of Human Services (DHS), including data on whether families have accessed or interacted with mental health services, child protective services, correction systems, drug/alcohol services, and more. This linked administrative data is fed into an algorithm used to flag which cases need intervention from General Protective Services—which often looks like separating a child from their family.

 
Virginia Eubanks Quote
 

Unfortunately, many of the variables used to predict abuse in the model are simply measures for poverty (e.g. use of the SNAP nutrition assistance program), or reflections of systems that disproportionately affect poor & racialized communities (e.g. juvenile probation). The DHS also holds less data on affluent families—who are afforded more privacy simply by accessing mental health and drug treatment programs that are private, rather than public. Eubanks also points out the frustrating and heartbreaking paradox of parents being seen as greater risks to their children through the algorithm when they access public services to try and improve their situation.

Despite its problematic nature, the AFST is often used as an exciting example of administrative data-sharing in action. This underscores the need for an expanded, and more inclusive, conversation around the risks administrative data-sharing presents to marginalized communities.

How then, to design a policy agenda on administrative data-sharing?

There will inevitably be a great deal of complexity tied to any policy agenda that aims to increase administrative data-sharing. This complexity does not mean we should shrink away from designing one. On the contrary, good data policy can act as a mechanism to ensure that the data rights of individuals and groups are protected.  The General Data Protection Regulation (GDPR), which recently came into effect in the EU, is a great example of how principles of consent, access, and control around our personal data can be enforced through law. Some have even framed the GDPR as a feminist regulation.

The risks attached to any administrative data sharing policy are critical to explore, and critical to explore collectively.  Issues of privacy, consent, and security are concerning for everyone when it comes to our personal data. However, as demonstrated above, the negative effects of data-sharing policies often have the greatest impact on communities at the margins. These same communities are usually excluded from the digital infrastructure development process.

That’s why we are using our best efforts to engage in an inclusive coalition-building process to co-create a policy agenda on administrative-data sharing in Canada. In addition to participating foundations and nonprofit service providers, the coalition is made up of a number of advocacy groups who have an intimate understanding of the challenges posed to communities facing systemic marginalization. These advocacy groups have insights on how changes to data policy could affect communities on the ground.  Moreover, recognizing the additional layer of complexity presented with respect to Indigenous data governance, we are excited to share that Dr Janet Smylie of Well Living House is involved in the project as a co-convening partner. She will also share information regarding approaches to Indigenous data governance and management.

In the coming weeks, we’ll write more on our how we are building this civil society policy coalition. For the time being, you can download our issue brief to learn more about administrative data-sharing, and our process brief to read more about our coalition-building and research plan for developing a Canadian policy agenda around this issue.

Are you interested in getting involved with this work—or do you have feedback or questions? We would love to hear from you! Please get in touch at info@poweredbydata.org

Powered by Data is grateful to the Ontario Trillium Foundation for their ongoing support and partnership in developing this initiative. Powered by Data is working in partnership with four co-convening partners to design this coalition: Philanthropic Foundations Canada, the Ontario Nonprofit Network, Colour of Poverty - Colour of Change, and Dr Janet Smylie of Well Living House.