Our latest data science reading list

Exploring data in action and how we can use data for public good

“Data” is seen as a four-letter word—literally and, oftentimes, figuratively. But used properly, data can be a tool for the masses. We’ve compiled some of our latest books on data science to explore the possibilities (and dangers) of big data. Read on to delve deeper into data, and sign up for our newsletter to hear more updates from the Press.


Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making by Bin Yu and Rebecca L. Barter

Most textbooks present data science as a linear analytic process involving a set of statistical and computational techniques without accounting for the challenges intrinsic to real-world applications. Veridical Data Science, by contrast, embraces the reality that most projects begin with an ambiguous domain question and messy data; it acknowledges that datasets are mere approximations of reality while analyses are mental constructs. Requiring little background knowledge, this lucid, self-contained textbook provides a solid foundation and principled framework for future study of advanced methods in machine learning, statistics, and data science.


Counting Feminicide: Data Feminism in Action by Catherine D’Ignazio

What isn’t counted doesn’t count. And mainstream institutions systematically fail to account for feminicide, the gender-related killing of women and girls, including cisgender and transgender women. Against this failure, Counting Feminicide brings to the fore the work of data activists across the Americas who are documenting such murders—and challenging the reigning logic of data science by centering care, memory, and justice in their work. Drawing on Data Against Feminicide, a large-scale collaborative research project, Catherine D’Ignazio describes the creative, intellectual, and emotional labor of feminicide data activists who are at the forefront of a data ethics that rigorously and consistently takes power and people into account.


We, the Data: Human Rights in the Digital Age by Wendy H. Wong

Our data-intensive world is here to stay, but does that come at the cost of our humanity in terms of autonomy, community, dignity, and equality? In We, the Data, Wendy H. Wong argues that we cannot allow that to happen. Exploring the pervasiveness of data collection and tracking, Wong reminds us that we are all stakeholders in this digital world, who are currently being left out of the most pressing conversations around technology, ethics, and policy. This book clarifies the nature of datafication and calls for an extension of human rights to recognize how data complicate what it means to safeguard and encourage human potential.


Data Feminism by Catherine D’Ignazio and Lauren F. Klein

Today, data science is a form of power. It has been used to expose injustice, improve health outcomes, and topple governments. But it has also been used to discriminate, police, and surveil. This potential for good, on the one hand, and harm, on the other, makes it essential to ask: Data science by whom? Data science for whom? Data science with whose interests in mind? The narratives around big data and data science are overwhelmingly white, male, and techno-heroic. In Data Feminism, Catherine D’Ignazio and Lauren Klein present a new way of thinking about data science and data ethics—one that is informed by intersectional feminist thought.


Data Action: Using Data for Public Good by Sarah Williams

Big data can be used for good, from tracking disease to exposing human rights violations, and for bad, implementing surveillance and control. Data inevitably represents the ideologies of those who control its use; data analytics and algorithms too often exclude women, the poor, and ethnic groups. In Data Action, Sarah Williams provides a guide for working with data in more ethical and responsible ways. Williams outlines a method that emphasizes collaboration among data scientists, policy experts, data designers, and the public. The approach generates policy debates, influences civic decisions, and informs design to help ensure that the voices of people represented in the data are neither marginalized nor left unheard.


Child Data Citizen: How Tech Companies Are Profiling Us from before Birth by Veronica Barassi

Our families are being turned into data, as the digital traces we leave are shared, sold, and commodified. Children are datafied even before birth, with pregnancy apps and social media postings, and then tracked through childhood with learning apps, smart home devices, and medical records. In Child Data Citizen, Veronica Barassi examines the construction of children into data subjects, describing how their personal information is collected, archived, sold, and aggregated into unique profiles that can follow them across a lifetime. Children today are the very first generation of citizens to be datafied from before birth, and Barassi points to critical implications for our democratic futures.


Beyond Data: Reclaiming Human Rights at the Dawn of the Metaverse by Elizabeth M. Renieris

Ever-pervasive technology poses a clear and present danger to human dignity and autonomy, as many have pointed out. And yet, for the past fifty years, we have been so busy protecting data that we have failed to protect people. In Beyond Data, Elizabeth Renieris argues that laws focused on data protection, data privacy, data security and data ownership have unintentionally failed to protect core human values, including privacy. And, as our collective obsession with data has grown, we have, to our peril, lost sight of what’s truly at stake in relation to technological development—our dignity and autonomy as people.


Democratizing Our Data: A Manifesto by Julia Lane

Public data are foundational to our democratic system. People need consistently high-quality information from trustworthy sources. In the new economy, wealth is generated by access to data; government’s job is to democratize the data playing field. Yet data produced by the American government are getting worse and costing more. In Democratizing Our Data, Julia Lane argues that good data are essential for democracy. Her book is a wake-up call to America to fix its broken public data system.


Harvard Data Science Review

Harvard Data Science Review journal

As an open access platform of the Harvard Data Science Initiative, Harvard Data Science Review (HDSR) features foundational thinking, research milestones, educational innovations, and major applications, with a primary emphasis on reproducibility, replicability, and readability. By uniting the strengths of a premier research journal, a cutting-edge educational publication, and a popular magazine, HDSR provides a crossroads at which fundamental data science research and education intersect directly with societally-important applications from industry, governments, NGOs, and others. By disseminating inspiring, informative, and intriguing articles and media materials, HDSR aspires to be a global forum on everything data science and data science for everyone.


Sign up for our newsletter to hear more updates from the Press