Humane Ingenuity 5: Libraries Contain Multitudes

More on the Use of AI/ML in Cultural Heritage Institutions

The piece by Clifford Lynch that I mentioned previously in HI3: AI in the Archives has now been published: “Machine Learning, Archives and Special Collections: A high level view.” Excerpt:

Some applications where machine learning have lead to breakthroughs that are highly relevant to memory organizations include translation from one language to another; transcription from printed or handwritten text to computer representation (sometimes called optical character recognition); conversion of spoken words to text; classification of images by their content (for example, finding images containing dogs, or enumerating all the objects that the software can recognize within an image); and, as a specific and important special case of image identification, human facial recognition. Advances in all of these areas are being driven and guided by the government or commercial sectors, which are infinitely better funded than cultural memory; for example, many nation-states and major corporations are intensively interested in facial recognition. The key strategy for the cultural memory sector will be to exploit these advantages, adapting and tuning the technologies around the margins for its own needs.

That last bit feels a bit more haunted this week with what’s going on in Hong Kong. Do read Clifford’s piece, and think about how we can return basic research in areas like facial recognition to entities that have missions divergent from those of governments and large companies.

Northern Illinois University has an incredible collection of 55,000 dime novels, that cheap and popular form of fiction that flourished in the United States in the late nineteenth century. As disposable forms of literature, many dime novels didn’t survive, and those that did are poorly catalogued, since it would require a librarian to read through each of these novels from cover to cover to grasp their full content and subject matter.

NIU is exploring using text mining and machine learning to generate good-enough subject headings and search tools for their collection, and an article from earlier this year outlines the process. (Alas, the article is gated; for those outside of academia, you can try Unpaywall to locate an open access version.) Matthew Short’s “Text Mining and Subject Analysis for Fiction; or, Using Machine Learning and Information Extraction to Assign Subject Headings to Dime Novels” is written in a laudably plainspoken way, and reaches some conclusions about a middle way between automated processes and human expertise:

The middle ground between fully-automated keyword extraction and full-level subject analysis might simply be to supply catalogers with a list of keywords to aid them in their work. From such a list, catalogers may be able to infer what those words suggest about the novel.

Yes! There’s a lot of work to be done on this kind of machine learning + human expert collaboration. Matthew has some good examples of how to aggregate unusual keywords into different top-level dime-novel genres, like seafaring, Westerns, and romance.

Next week I’ll be at the Digital Library Federation’s annual forum and will try to newsletter from there. There’s a session on “Implementing Machine-Aided Indexing in a Large Digital Library System” that should provide further grist for this mill.


The Cleveland Museum of Art recently launched a new site for digitized works of art from their collection, with some of them in 3D, including this wonderful 500-year-old piggy bank from Java:


Libraries Contain Multitudes

Several HI subscribers pointed me to Alia Wong’s piece in The Atlantic “College Students Just Want Normal Libraries,” on how students want “normal” things like printed books rather than new tech (or “glitz,” in Alia’s more loaded term) in college libraries, and how it seems to contradict my piece earlier this year in The Atlantic, “The Books of College Libraries Are Turning Into Wallpaper.”

As those same correspondents also discerned, some of the apparent contradiction seems to be due to the disconnect between student self-reporting and actual library indicators; much of what Alia points to for evidence are surveys of what students say they want, while I tried to highlight the unsettling hard data that book circulations in research libraries are declining precipitously and ceaselessly, with students andfaculty checking out far fewer books than they used to. (There may not even be that much of a disconnect on books, as you can see in one of the surveys Alia highlights from 2015.)

Anyway, Alia’s piece is worth the read and I do not include it here for extended criticism. She makes many good points, and the allocation of space within libraries is a complicated issue that all librarians have been wrestling with, as I tried to note in my own piece. Alia and I actually agree on much, including, as Alia writes, the significant need for “a quiet place to study or collaborate on a group project” and that “many students say they like relying on librarians to help them track down hard-to-find texts or navigate scholarly journal databases.” Yes!

Where I do want to lodge an objection, however, is with the notion that I’ve been pushing back against in this newsletter: the too frequent, and easy to fall into, trope of a binary opposition between traditional forms of knowledge and contemporary technology. Or as Roy Rosenzweig and I put it in Digital History, the stark polarization of technoskepticism versus cyberenthusiasm is extremely unhelpful, and we should instead seek a middle way in which we maximize the advantages of technology and minimize its drawbacks. This requires a commingling of old and new that is less about glitz and more about how the old and new can best contribute, together, to our human understanding and expression.

Because so many of us care so much about the library as an institution, it has become an especially convenient space to project, in a binary way, the “normal” or “traditional” versus the “futuristic.” Most universities aren’t building glitzy new libraries, but are instead trying as best they can to allocate limited space for multiple purposes. The solutions to those complex equations will vary by community, and even in self-reported student surveys of what students want out of their library (and our library surveys thousands of students every two years to assess the needs and desires Alia covers), there’s a wide diversity of opinion.

Let’s not fall into the trap of thinking that all students want roughly the same thing, or define “normal” for all libraries; some students want and in fact need tech, while others want quiet space for reading, and many of them move from quiet spaces to tech spaces during the course of a single day. Our library has a room for 3D printers and an AR/VR lab; combined, that “glitz” takes up about 1000 square feet in a library that has well over 100,000 square feet of study space.

The library can and should accommodate multiple forms of knowledge-seeking—and better yet, and most critically for the continued vibrancy of the institution, forge connections between the old and new.

(More on this theme: Last week I was on The Agenda on TV Ontario to talk about my piece in The Atlanticand to discuss those complicated questions about the state of reading and the use of books. Christine McWebb and Randy Boyagoda joined me on the program and had many good comments about how and when to encourage students to engage with books. Watch: “The Lost Art of Reading.”)


On this week’s What’s New podcast from the Northeastern University Library, my guest is Nada Sanders, Distinguished Professor of Supply Chain Management at the D’Amore McKim School of Business, and author of the recently published book The Humachine: Humankind, Machines, and the Future of Enterprise. The conversation covers the impact of automation and AI on factories and businesses, and how greater efficiency from those increasingly computer-driven enterprises is causing huge problems for workers and small businesses. Tune in.