Calling for a New Approach to Data Science
If you haven’t read it yet, go immediately and find Catherine D’Ignazio and Lauren Klein’s timely work, Data Feminism. Published by MIT press as a part of the Strong Ideas Series, you can download an Open Access copy to read on screen, check out a physical copy from WashU Libraries, or purchase one from a local book store.
First of all, I hope we can recognize that feminism is about achieving what’s good for everybody, not simply women. However, the authors do provide a broad explanation of how and why everyone is impacted by systemic oppression that is embedded in data science.
They begin by defining some key terms; feminism indicates the “diverse and wide-ranging projects that name and challenge sexism and other forces, as well as those which seek to create more just, equitable, and livable futures.” Intersectionality, coined by Kimberle Crenshaw in the 1980s, describes discrimination based on multiple minority identities.
Data science is described as a “commitment to systematic methods of observation and experiment.” Co-liberation, a major theme, is explained as “the idea that oppressive systems of power harm all of us, that they undermine the quality and validity of our work, and they hinder us from creating true and lasting impact with data science.”
The book is made up of seven principles that focus on understanding and analyzing the systematic problems created by a power imbalance in data science. The authors discuss taking action to balance the power scales by designing with the goal of co-liberation for mutual benefit.
In an approach near and dear to my heart, they talk about the importance of people before data. According to the authors, data comes from many sources, all related to “living, feeling bodies in the world.” Readers are encouraged to think about who’s counting what about whom, how it’s being classified, and question the findings.
Because data impacts us all, everyone should have a hand in the domain of data, thereby breaking down the idea of data superheroes and inviting strangers into the datasets. Allow the people, represented by the data, to retain power over data collection and creation of derivatives. This will provide a context for the generated data because data without context is likely to be misunderstood and misused. This is also an issue with the drive to reap benefits from “big data.”
Working from the well-established concept of invisible labor, often performed by minorities, the authors unpack ways in which data science leverages unseen labor harvested from the social media content, data entry, digitization, and data processing. Crediting and celebrating all contributions will drive home the fact that no data product is due to one contributor.
A work like this provides a framework for all of us to begin to understand the limitations and biases, as well as the people and contexts behind data science. Take, for example, the “god effect” which presents data as an absolute authority, without uncertainty, which is misleading. Current best practices suggest demonstrating uncertainty provides a more realistic, if less comfortable, view of the data.
There are many examples of how data can mislead, examples of data projects that have erased the data contributors (e.g., Google Books project), people left out of data collection (e.g., the exclusion of trans people from binary categories), and people who’ve been systematically oppressed on account of the data collected about them (e.g. redlining in Black communities).
Data Feminism focuses on such systemic failures, similar to what we see in other works such as Weapons of Math Destruction by Cathy O’Neill and Algorithms of Oppression by Safiya Nobel. These books provide a lens for us to reimagine data science and technology, to not only be more just and equitable, but also richer and more accurate.
The authors of Data Feminism articulate their vision, “Oppression is the problem, equity is the path, and co-liberation is the goal.”
This book is essential reading for all people working in data science to recognize and address the embedded oppression in our systems. It’s also essential reading for everyone else to gain an understanding of bias and oppression in data science today so that we can rebuild toward co-liberation.