Humane Ingenuity 37: Data and the Humanities

If there’s one thing we’ve learned about the many datasets we’ve wrestled with this year, it’s that all the data — every single point — is the result of human decision-making.

These essential words are the lede of a great reflection by Erin Kissane, a co-founder of the COVID Tracking Project and CTP’s managing editor. The project is a terrific case study in humane ingenuity, because what seemed like a straightforward data and technology project — tracking COVID cases across the United States — was in fact primarily animated by skills from the humanities and deeply imbued with a humane spirit.

As majors have sharply declined over the last decade, a thousand verbose defenses of the humanities have been published. But as all writers should know, it’s better to show than to tell, and CTP did a damn good job embodying key methods and ethical choices from the humanities. The project also made the implicit point that data work doesn’t belong, by default or fiat, to STEM fields.

CTP put those values into practice through the foregrounding of uncertainty, context, and care. Although the project compiled reams of numbers, they refused to let those numbers drift off into pure quantitative metrics, and they always noted the potential fallibility of each digit. Human error, at the state or local level or within the project itself, the peculiarities of health reports or highly variable definitions, were all measured and analyzed by CTP staff and volunteers. The data was not just accumulated into a spreadsheet; it was tightly coupled with careful interpretation, glosses, and a close reading of primary sources.

This lack of clarity was present in most of the metrics we collected, and meant that we spent hundreds, maybe thousands, of person-hours reading footnotes in obscure state PDFs and watching press conferences to try to catch any turns of phrase that would tell us what — and who — was really represented in a given figure. Definitional problems substantial enough to shape whole narratives about the pandemic haunted our work all year, and we tried to communicate both the answers we found and the uncertainty we encountered.

Kissane’s conclusion points toward an alternative digitial world that has the humanities at its core:

I suspect that a disciplined commitment to messy truths over smooth narratives would also breathe life into technology, journalism, and public health efforts that too frequently paper over the complex, many-voiced nature of the world.

“Generative Unfoldings” is a new exhibit of fourteen software artworks that adopt a humane perspective — sometimes serious, sometimes humorous.

Philipp Schmitt’s “Curse of Dimensionality” creates, on the fly in your web browser, a pairing of an abstract idea for a figure in a science or philosophy journal, and then illustrates it with random but plausible bits of visualization:

Many of the images Schmitt’s code produces remind me of Chad Hagen’s “Nonsensical Infographics,” a similar kind of critique through design:

Sprawling. Fast-moving. Ephemeral. The US Post operated a gossamer network, capable of rapidly spinning out new tendrils to distant places and then melting away at a moment’s notice.

My former colleague Cameron Blevins has a new book out from Oxford University Press, Paper Trails: The US Post and the Making of the American West. As with the COVID Tracking Project, Paper Trails shows the potency of uncovering or producing a trustworthy, unique dataset. In this case, the data comes from a set of tables in old print volumes, which ended up on a CD-ROM, and then were ported to Dataverse/Github. The migration of this data into a modern format is a great story in itself, as Blevins details in a “data biography”:

Richard W. Helbock, a postal historian and philatelist, published United States Post Offices, an 8-volume series aimed at fellow stamp collectors as “the first attempt to publish a complete listing of all the United States post offices which have ever operated in the nation.” I discovered Helbock’s work in 2013, two years after he passed away. Thankfully, Catherine Clark was still selling her late husband’s work online and I was able to purchase a CD-ROM of the data.

And the outcome is comprehensive and compelling:

US Post Offices is a spatial-historical dataset containing records for 166,140 post offices that operated in the United States between 1639 and 2000. The dataset provides a year-by-year snapshot of the national postal system over multiple centuries, making it one of the most fine-grained and expansive datasets currently available for studying the historical geography of the United States.

Blevins’ data highlights, perhaps better than any other evidence, how the westward expansion of the United States was strongly tied to state power rather than individual or local activity by European settlers, as it was the Post Office infrastructure (linked, of course, to the military and other levers of the state) that enabled the kind of communication network and support lines that eventually led to the seizing of native lands. (Just look at those tendrils shooting west from the Mississippi.)

Blevins created a great companion website for the book with Yan Wu and Steven Braun, who was our data visualization specialist at the Northeastern University Library.

There are many things you can do with this data, all of which is now downloadable.

Related: Justin Gage’s Native American Networks:

A video recording of the panel I mentioned in HI36, on a new platform for digitizing archives and what it might mean for researchers, libraries, and archivists, present and future, is now available. I enjoyed participating in this lively discussion.

Finally, something for newsletter readers who are in the middle of the Venn diagram of cats and information design: Ziyi Zhao, “Understanding Cat Behavior: Using Notational Systems to Represent the Relationship of Cats’ Postures and Facial Expression.”

(via my library colleague Sarah Sweeney)

April 7, 2021

In Uncategorized