Category: Libraries

On the Response to My Atlantic Essay on the Decline in the Use of Print Books in Universities

I was not expecting—but was gratified to see—an enormous response to my latest piece in The Atlantic, “The Books of College Libraries Are Turning Into Wallpaper,” on the seemingly inexorable decline in the circulation of print books on campus. I’m not sure that I’ve ever written anything that has generated as much feedback, commentary, and hand-wringing. I’ve gotten dozens of emails and hundreds of social media messages, and The Atlantic posted (and I responded in turn to) some passionate letters to the editor. Going viral was certainly not my intent: I simply wanted to lay out an important and under-discussed trend in the use of print books in the libraries of colleges and universities, and to outline why I thought it was happening. I also wanted to approach the issue both as the dean of a library and as a historian whose own research practices have changed over time.

I think the piece generated such a large response because it exposed a significant transition in the way that research, learning, and scholarship happens, and what that might imply for the status of books and the nature of libraries—topics that often touch a raw nerve, especially at a time when popular works extol libraries—I believe correctly—as essential civic infrastructure.

But those works focus mostly on public libraries, and this essay focused entirely on research libraries. People are thankfully still going to and extensively using libraries, both research and public (there were over a billion visits to public libraries in the U.S. last year), but they are doing so in increasingly diversified ways.

The key to my essay were these lines:

“The decline in the use of print books at universities relates to the kinds of books we read for scholarly pursuits rather than pure pleasure…A positive way of looking at these changes is that we are witnessing a Great Sorting within the [research] library, a matching of different kinds of scholarly uses with the right media, formats, and locations.”

Although I highlighted statistics from Yale and the University of Virginia (which, alas, was probably not very kind to my friends at those institutions, although I also used stats from my own library at Northeastern University), the trend I identified seems to be very widespread. Although I only mentioned specific U.S. research libraries, my investigations showed that the same decline in the use of print collections is happening globally, albeit not necessarily universally. In most of the libraries I examined, or from data that was sent to me by colleagues at scores of universities, the circulation of print books within research libraries is declining at about 5-10% per year per student (or FTE).

For example, in the U.K. and Ireland, over the three years between the 2013-14 school year and the 2016-17 school year, the circulation of print books per student declined by 27%, according to the Society of College, National and University Libraries (SCONUL), which represents all university libraries in the U.K. and Ireland. Meanwhile, SCONUL reports that visits to these libraries have actually increased during this period. (SCONUL’s other core metric, print circulations per student visit to the library, has thus declined even more, by 33% over three years.) Similarly, the Canadian Association of Research Libraries (CARL), which maintains the statistics for university libraries in Canada, notes that during these same three years, the average yearly print circulation at their member libraries dropped from 200,000 to 150,000 books, and their per-student circulation number also dropped by 25%.

Again, this is just over three recent years. The decline becomes even more severe as one goes further back in time. In the 2005-6 school year, the average Canadian research library circulated 30 books per student, which slid to 25 in 2008-9; by 2016-17 that number was just 5. Readers of my article were shocked that UVA students had only checked out 60,000 books last year, compared to 238,000 a decade ago, but had I gone all the way back in the UVA statistics to two decades ago, the comparison would have been even more stark. The total circulation of books in the UVA library system was 1,085,000 in 1999-2000 and 207,000 in 2016-17. Here’s the overall graph of print circulation (in “initial circs,” which do not include renewals) from the Association of Research Library (U.S.), showing a 58% decline between 1991 and 2015, but an even larger decline since Peak Book and an even larger decline on a per student basis, since during this same period the student body at these universities increased 40%.

These longer time frames underline how this is an ongoing, multi-decade shift in the ways that students and faculty interact with and use the research library. All research libraries are experiencing such forces and pressing additional demands—the need for new kinds of services and spaces as well as the surging use of digital resources and data—while at the same time continuing to value physical artifacts (archives and special collections) and printed works. It’s a very complicated, heterogeneous environment for learning and scholarship. Puzzling through the correct approach to these shifts, rather than ignoring them and sticking more or less with the status quo, was what I was trying to prod everyone to think about in the essay, and if I was at all successful, that’s hopefully all to the good.

June 6, 2019
When a Presidential Library Is Digital

I’ve got a new piece over at The Atlantic on Barack Obama’s prospective presidential library, which will be digital rather than physical. This has caused some consternation. We need to realize, however, that the Obama library is already largely digital:

The vast majority of the record his presidency left behind consists not of evocative handwritten notes, printed cable transmissions, and black-and-white photographs, but email, Word docs, and JPEGs. The question now is how to leverage its digital nature to make it maximally useful and used.

This almost-entirely digital collection, and its unwieldy scale and multiple formats, should sound familiar to all of us. Over the past two decades, we have each become unwitting archivists for our own supersized collections, as we have adopted forms of communication that are prolific and easy to create, and that accumulate over time into numbers that dwarf our printed record and can easily mount into a pile of digital files that borders on shameful hoarding. I have over 300,000 email messages going back to my first email address in the 1990s (including an eye-watering 75,000 that I have sent), and 30,000 digital photos. This is what happens when work life meets Microsoft Office and our smartphone cameras meet kids and pets.

Will we have lost something in this transition? Of course. Keeping a dedicated archival staff in close proximity to a bounded paper-based collection yields real benefits. Having a researcher who is on site discover a key note on the back of a typescript page is also special.
However, although the analog world can foster great serendipity, it does not have a monopoly on such fortunate discoveries. Digital collections have a serendipity all their own.

Please do read the whole article for my thoughts about how we should approach the design of this digital library, and the possibilities it will enable, including broad access and new forms of research.

April 9, 2019
Presidential Libraries and the Digitization of Our Lives

Buried in the recent debates (New York Times, Chicago Tribune, The Public Historian) about the nature, objectives, and location of the Obama Presidential Center is the inexorable move toward a world in which virtually all of the documentation about our lives is digital.

To make this decades-long shift—now almost complete—clear, I made the following infographic comparing three representative presidential libraries, each a generation apart: LBJ’s, Bill Clinton’s, and Barack Obama’s. Each square represents the relative overall size of these presidential archives—roughly 46 million pages for LBJ, 100 million for Clinton, and 360 million for Obama—as well as the basic categories of archival material: paper documents, photographs and audiovisual media, and, starting with Clinton, email.

LBJ Presidential Library

Clinton Presidential Library

Obama Presidential Library

The LBJ Presidential Library has 45 million pages of paper documents and a million photographs, recordings, and other media. The Clinton Presidential Library contains 78 million pages of documents, 20 million emails, 2 million photographs, and 12,500 videotapes. (Note that contrary to all of the recent coverage of Obama as “the first digital president,” given his administration’s rapid adoption of email in the 1990s, Clinton really should hold that title, as I’ve discussed elsewhere.)

We are still in the process of assessing all that will go into the Obama Presidential Library (other libraries have added considerable new caches of documents over time), but the rough initial count from the U.S. National Archives and Records Administration is that there are about 300 million emails from Obama’s eight years in the White House, and about 30 million pages of paper documents. The chart above would be even more email-centric for Obama’s library if I used NARA’s calculation of a few paper pages per email, which would equal over a billion pages in printed form. In other words, using a more rigorous comparison at best only 3% of the Obama record is print vs. digital.

More vaguely estimated above are the millions of “pages” associated with the many other digital forms the Obama administration used, including websites, apps, and social media (you can already download the entirety of the latter as .zip files here). Most of the photos (many of which were uploaded to Flickr) and videos were of course also born digital. (Update, 3/11/19: The Obama Foundation came out with a new fact sheet that says that “an estimated 95 percent of the Obama Presidential Records were created digitally and have no paper equivalents. It also says that there are roughly 1.5 billion pages in the collection, including everything I’ve detailed here.)

It’s unfortunate that it’s still relatively expensive and time-consuming to digitize analog materials. Nearly two decades on, the Clinton Presidential Library has only digitized about 1% of their paper holdings (about 700,000 pages). The Reagan Presidential Library charges $.80 to digitize one page of his archives. The Obama Presidential Center’s commitment to funding the complete digitization of those 30 million paper pages, in what seems like a more rapid fashion and with open access to the public, seems rather laudable in this context.

Ultimately, I suppose it’s best to say that Obama was “the first almost fully digital president,” and with the digitization of the remaining paper record, will become “the first fully machine-readable and -indexed president.” (Part of the debate in academic and library circles about this shift in the Obama Presidential Center/Library has to do with the role of archivists and historians to create good metadata for, and more thorough searches through, administration documents, but with a billion+ pages, I don’t see how this can be done without serious computational means.)

Meanwhile, all of us have more quietly followed the same path, with only a very small percentage of our overall record now existing in physical formats rather than bits. How we will preserve this heterogeneous and perhaps ephemeral digital record when we don’t have our own presidential libraries and the resources of NARA is a different and more worrisome story.

March 10, 2019
What We Learned from Studying the News Consumption Habits of College Students

Over the last year, I was fortunate to help guide a study of the news consumption habits of college students, and coordinate Northeastern University Library’s services for the study, including great work by our data visualization specialist Steven Braun and necessary infrastructure from our digital team, including Sarah Sweeney and Hillary Corbett. “How Students Engage with News,” out today as both a long article and accompanying datasets and media, provides a full snapshot of how college students navigate our complex and high-velocity media environment.

This is a topic that should be of urgent interest to everyone since the themes of the report, although heightened due to the more active digital practices of young people, capture how we all find and digest news today, and also points to where such consumption is heading. On a personal level, I was thrilled to be a part of this study as a librarian who wants students to develop good habits of truth-seeking, and as an intellectual historian, who has studied changing approaches to truth-seeking over time.

You should first read the entire report, or at least the executive summary, now available on a special site at Project Information Literacy, with data hosted at Northeastern University Library’s Digital Repository System (where the study will also have its long-term, preserved form). It’s been great to work with, and think along with, the lead study members, including Alison Head, John Wihbey, Pakis Metaxas, and Margy MacMillan.

“How Students Engage with News” details how college students are overwhelmed by the flood of information they see every day on multiple websites and in numerous apps, an outcome of their extraordinarily frequent attention to smartphones and social media. Students are interested in news, and want to know what’s going on, but given the sheer scale and sources of news, they find themselves somewhat paralyzed. As humans naturally do in such situations, students often satisfice in terms of news sources—accepting “good enough,” proximate (from friends or media) descriptions rather than seeking out multiple perspectives or going to “canonical” sources of news, like newspapers. Furthermore, much of what they consume is visual rather than textual—internet genres like memes, gifs, and short videos play an outsized role in their digestion of the day’s events. (Side note: After recently seeing Yale Art Gallery’s show “Seriously Funny: Caricature Through the Centuries,” I think there’s a good article to be written about the historical parallels between today’s visual memes and political cartoons from the past.) Of course, the entire population faces the same issues around our media ecology, but students are an extreme case.

And perhaps also a cautionary tale. I think this study’s analysis and large survey size (nearly 6,000 students from a wide variety of institutions) should be a wake-up call for those of us who care about the future of the news and the truth. What will happen to the careful ways we pursue an accurate understanding of what is happening in the world by weighing information sources and developing methods for verifying what one hears, sees, and reads? Librarians, for instance, used to be much more of a go-to source for students to find reliable sources of the truth, but the study shows that only 7% of students today have consulted their friendly local librarian.

It is incumbent upon us to change this. A purely technological approach—for instance, “improving” social media feeds through “better” algorithms—will not truly solve the major issues identified in the news consumption study, since students will still be overwhelmed by the volume, context, and heterogeneity of news sources. A more active stance by librarians, journalists, educators, and others who convey truth-seeking habits is essential. Along these lines, for example, we’ve greatly increased the number of workshops on digital research, information literacy, and related topics at Northeastern University Library, and students are eager attendees at these workshops. We will continue to find other ways to get out from behind our desks and connect more with students where they are.

Finally, I have used the word “habit” very consciously throughout this post, since inculcating and developing more healthy habits around news consumption will also be critical. Alan Jacobs’ notion of cultivating “temporal bandwidth” is similar to what I imagine will have to happen in this generation—habits and social norms that push against the constant now of social media, and stretch and temper our understanding of events beyond our unhealthily caffeinated present.

October 16, 2018
Haunted by the Past

Top: The Scarif Archive in Rogue One / Bottom: Robotic storage facility in the Mansueto Library at the University of Chicago

Ever since Jyn Erso and Cassian Andor extracted the Death Star plans from a digital repository on the planet Scarif in Rogue One, libraries, archives, and museums have played an important role in tentpole science fiction films. From Luke Skywalker’s library of Jedi wisdom books in The Last Jedi, to Blade Runner 2049’s multiple storage media for DNA sequences, to a fateful scene in an ethnographic museum in Black Panther, the imposing and evocative halls of cultural heritage organizations have been in the foreground of the imagined future.

There have been scattered instances of cultural memory institutions in such films in the past—my colleagues in the library will recall, with some eye-rolling, the librarian Jocasta Nu in Star Wars, Episode II: Attack of the Clones—but the appearance of these institutions in recent speculative fiction on the screen seem especially relevant and rich, and central to their plots.

Which begs the question: Why are today’s science fiction films obsessed with libraries, archives, and museums?

The answer of course is rooted in how science fiction has always pursued a heightened understanding of our very real present. At the same time that these movies portray an imagined future, they are also exploring our current anxiety about the past and how it is stored; how we simultaneously wish to leave the past behind, and how it may also be impossible to shake it. They indicate that we live in an age that has an extremely strained relationship with history itself. These films are processing that anxiety on Hollywood’s big screen at a time when our small screens, social media, and browser histories document and preserve so much of we do and say.

Luke Skywalker’s collection of rare books in The Last Jedi neatly captures the tension inherent in these movies. In an egg-shaped stone hut reminiscent of (and indeed filmed in) the rural parts of western Ireland where Christian monasteries were established in the Middle Ages, Luke’s archive of Jedi books represent a profound bond to the traditional wisdom of the Jedi cult. Yet as the movie proceeds, it is clear that these volumes are also a strong link in the chain that holds Luke back. Ultimately his little library is not a source of knowledge, but one of angst. It makes him surly and disassociated from present possibilities, and he must ultimately sever himself from the past that is encapsulated in paper. Burning the books becomes a necessary precursor to his taking action, and to moving to the metaphysical (and more real) plane of the Jedi.

Black Panther uses two characters, rather than one, to embody the tense dynamic between setting history aside and being unable to let it go: the dueling figures of T’Challa (Black Panther) and N’Jadaka (Erik Killmonger). T’Challa understands that black people have been abused and enslaved, globally, for centuries. And yet he imagines a day when Wakanda steps beyond this past, and integrates their society and advanced technology with the outside world that has done so much wrong to them. He is a forward-looking optimist.

N’Jadaka, on the other hand, seethes with anger about the past, and how it is so vividly documented in the halls of cultural heritage institutions. Before he declines into a more monochromatic villain, he experiences frankly justifiable rage at what whites have done with black culture—namely, stolen and stored it like an alien, and lesser, culture, in glass-cased museums. A pivotal scene in one such museum reflects the troubled genesis of institutions such as the Pitt Rivers Museum, which collected artifacts of non-white culture from the British Empire to be viewed and dissected by professors in Oxford.

In one of the most memorable lines of Public Enemy’s It Takes a Nation of Millions to Hold Us Back, the seminal rap album that documents what happened to African slaves and their descendents in the United States, Flava Flav shouts “I got a right to be hostile!” given this terrible history. A poster of that album is on the wall of N’Jadaka’s father’s apartment in Oakland, and it frames, like the glass case in the museum, the young man’s views of the world in which his ancestors have been constantly subjugated.

Blade Runner 2049 is even more unrelentingly pessimistic about the future and its connection to the past. In the movie’s opening, we are told that the documentary evidence of that past has been wiped out in a catastrophic electronic pulse that destroyed digital photographs and electronic records. As we learn, however, not all archives are lost. While personal images and documents that were never printed are gone forever, some plutocratic corporations maintain archival records, and we see several of them in the film: digital media as well as formats encased in glass spheres and more recognizable microfilm. Nevertheless, these archives are imperfect, like so much in the film. Even a leather-bound handwritten book of records in a wasteland orphanage has critical pages ripped out.

Because it is based on the work of Philip K. Dick, who was obsessed with libraries as part of a larger obsession with memory and reality, Blade Runner 2049 ultimately binds not only the past and present together, but the archival and the alive. Humans and replicants, the movie seems to argue, are simply incarnations of archival records, fleshy beings made up of the synthetic or parental DNA that form their core information architecture and the libraries of memories that are either fabricated or lived. This uneasy fusion is at the dark core of the film and its philosophical examination of the permeable boundary between the real and the artificial.

For all of these films, the past constantly threatens to come back to haunt the present. (Just ask those on the Death Star.) In turn, these big-screen portrayals of imagined libraries, archives, and museums should make us reconsider how what we preserve and make accessible reflects—and perhaps determines—who we really are.

May 14, 2018
The Digital Public Library of America, Me, and You

Twenty years ago Roy Rosenzweig imagined a compelling mission for a new institution: “To use digital media and computer technology to democratize history—to incorporate multiple voices, reach diverse audiences, and encourage popular participation in presenting and preserving the past.” I’ve been incredibly lucky to be a part of that mission for over twelve years, at what became the Roy Rosenzweig Center for History and New Media, with last five and a half years as director.

Today I am announcing that I will be leaving the center, and my professorship at George Mason University, the home of RRCHNM, but I am not leaving Roy’s powerful vision behind. Instead, I will be extending his vision—one now shared by so many—on a new national initiative, the Digital Public Library of America. I will be the founding executive director of the DPLA.

The DPLA, which you will be hearing much more about in the coming months, will be connecting the riches of America’s libraries, archives, and museums so that the public can access all of those collections in one place; providing a platform, with an API, for others to build creative and transformative applications upon; and advocating strongly for a public option for reading and research in the twenty-first century. The DPLA will in no way replace the thousands of public libraries that are at the heart of so many communities across this country, but instead will extend their commitment to the public sphere, and provide them with an extraordinary digital attic and the technical infrastructure and services to deliver local cultural heritage materials everywhere in the nation and the world. The DPLA has been in the planning stages for the last few years, but is about to spin out of Harvard’s Berkman Center for Internet and Society and move from vision to reality. It will officially launch, as an independent nonprofit, on April 18 at the Boston Public Library. I will move to Boston with my family this summer to lead the organization, which will be based there. It is such a great honor to have this opportunity.

Until then I will be transitioning from my role as director of RRCHNM, and my academic life at Mason. Everything at the center will be in great hands, of course; as anyone who visits the center immediately grasps, it is a highly collaborative and nonhierarchical place with an amazing staff and an especially experienced and innovative senior staff. They will continue to shape “the future the past,” as Roy liked to put it. I will miss my good friends at the center, but I still expect to work closely with them, since so many critical software initiatives, educational projects, and digital collections are based at RRCHNM. A search for a new director will begin shortly. I will also greatly miss my colleagues in Mason’s wonderful Department of History and Art History.

At the same time, I look forward to collaborating with new friends, both in the Boston office of the DPLA and across the United States. The DPLA is a unique, special idea—you don’t get to build a massive new library every day. It is apt that the DPLA will launch at the Boston Public Library’s McKim Building, with those potent words carved into stone above its entrance: “Free to all.” The architect Charles Follen McKim rightly called it “a palace for the people,” where anyone could enter to learn, create, and be entertained by the wonders of books and other forms of human expression.

We now have the chance to build something like this for the twenty-first century—a rare, joyous possibility in our too-often cynical age. I hope you will join me in this effort, with your ideas, your contributions, your energy, and your public spirit.

Let’s build the Digital Public Library of America together.

March 5, 2013
Visualizing the Uniqueness, and Conformity, of Libraries

Tucked away in a presentation on the HathiTrust Digital Library are some fascinating visualizations of libraries by John Wilkin, the Executive Director of HathiTrust and an Associate University Librarian at the University of Michigan. Although I’ve been following the progress of HathiTrust closely, I missed these charts, and I want to highlight them as a novel method for revealing a library fingerprint or signature using shared metadata.

With access to the catalogs of HathiTrust member libraries, Wilkin ran some comparisons of book holdings. His ingenious idea was not only to count how many libraries held each particular work, but to create a visualization of each member library based on how widely each book in its collection is held by other libraries.

In Wilkin’s graphs for each library, the X axis is the number of libraries containing a book (including the library the visualization represents), and the Y axis is the number of books. That is, it contains columns of books from 1 (the member library is the only one with a particular book) to 41 (every library in HathiTrust has a physical copy of a book). Let’s look at an example:

Reading the chart from left to right, the University of Illinois at Urbana-Champaign library has a small number of books that it alone holds (~1,000), around 25,000 that only one other library has (the “2” column), 36,000 that two other libraries have, etc.

What’s fascinating is that the overall curvature of a graph tells us a great deal about a particular library.

There are three basic types of libraries we can speak of using this visualization technique. First, there are left-leaning libraries, which have a high number of books that do not exist in many other libraries. These libraries have spent considerable effort and resources acquiring rare volumes. For example, Harvard, which has hundreds of thousands of books that only a handful of other libraries also have:

On the other side, there are right-leaning libraries, which consist mostly of books that are nearly universally held by other libraries. These libraries generally carry only the most circulated volumes, books that are expected to be found in any academic research library. For instance, Lafayette College:

Finally, there are rounded libraries, which don’t have many popular books or many rare books, but mostly works that an average number of similar libraries have. These libraries roughly echo their cohort (in this case, large university research libraries in the United States). They could be called—my apologies—well-rounded in their collecting, likely acquiring many scholarly monographs while still remaining selective rather than comprehensive. For instance, Northwestern University:

Of course, the library curve is often highly correlated with the host institution’s age, since older universities are more likely to have rare old books or unusual (e.g., local or regional) books. This correlation is apparent in this sequence of graphs of the University of California schools, from oldest to newest:

Beyond the three basic types, there are interesting anomalies as well. The University of Virginia is, unsurprisingly, a left-leaning library, but not quite as a left-leaning as I would have expected:

Cornell is also left-leaning, but also clearly has a large, idiosyncratic collection containing works that no other library has—note the spike at position “1”:

Moreover, one could imagine using Wilkin Graphs (I’m going to go ahead and name it that to give John full credit) to analyze the relative composition of other kinds of libraries. For instance, LibraryThing has a project called Legacy Libraries, containing the records of personal libraries of famous historical figures such as Thomas Jefferson. A researcher could create Wilkin Graphs for Jefferson and other American founders (in relation to each other), or among intellectuals from the Enlightenment.

Update: Sherman Dorn suggests Wilkin Profile rather than Wilkin Graph. Sure, rolls off the tongue better: Prospective college student on a campus visit asks the tour guide, “So what’s your library’s Wilkin Profile?” According to Constance Malpas, OCLC has created such profiles for 160 libraries. These graphs can be created with the Worldcat Collection Analysis service (which, alas, is not openly available).

Clarification: John Wilkin comments below that the reason for the spike in position 1 in the Cornell Wilkin Profile is that Cornell had a digitization program that added many unique materials to HathiTrust. This made me realize, with some help from Stanford Library’s Chris Bourg and Penn State’s Mike Furlough that the numbers here are only for the shared HathiTrust collection (although that collection is very large—millions of items). Nevertheless, the general profile shapes should hold for more comprehensive datasets, although likely with occasional left and right shifts for certain libraries depending on additional unique book collections that have not been digitized. (That may explain the University of Virginia Wilkin Profile.) Note also that Google influenced the numbers here, since many of the scanned books come from the Google Books (née Google Library) project, introducing some selection bias which is only now being corrected—or worsened?—by individual institutional digitization initiatives, like Cornell’s.

December 13, 2012
DPLA Audience & Participation Workshop and Hackfest at the Center for History and New Media

On December 6, 2012, the Digital Public Library of America will have two concurrent and interwoven events at the Roy Rosenzweig Center for History and New Media at George Mason University in Fairfax, VA. The Audience and Participation workstream will be holding a meeting that will be livestreamed, and next door those interested in fleshing out what might be done with the DPLA will hold a hackfest, which follows on a similar, successful event last month in Chattanooga, TN. (Here are some of the apps that were built.)

Anyone who is interested in experimenting with the DPLA—from creating apps that use the library’s metadata to thinking about novel designs to bringing the collection into classrooms—is welcome to attend or participate from afar. The hackfest is not limited to those with programming skills, and we welcome all those with ideas, notions, or the energy to collaborate in envisioning novel uses for the DPLA.

The Center for History and New Media will provide spaces for a group as large as 30 in the main hacking space, with couches, tables, whiteboards, and unlimited coffee. There will also be breakout areas for smaller groups of designers and developers to brainstorm and work. We ask that anyone who would like to attend the hackfest please register in advance via this registration form.

We anticipate that the Audience and Participation workstream and the hackfest will interact throughout the day, which will begin at 10am and conclude at 5pm EST. Breakfast will be provided at 9am, and lunch at midday.

The Center for History and New Media is on the fourth floor of Research Hall on the Fairfax campus of George Mason University. There is parking across the street in the Shenandoah Parking Garage. (Here are directions and a campus map.)

November 21, 2012
The Digital Public Library of America: Coming Together

I’m just back from the Digital Public Library of America meeting in Chicago, and like many others I found the experience inspirational. Just two years ago a small group convened at the Radcliffe Institute and came up with a one-sentence sketch for this new library:

An open, distributed network of comprehensive online resources that would draw on the nation’s living heritage from libraries, universities, archives and museums in order to educate, inform and empower everyone in the current and future generations.

In a word: ambitious. Just two short years later, out of the efforts of that steering committee, the workstream members (I’m a convening member of the Audience and Participation workstream), over a thousand people who participated in online discussions and at three national meetings, the tireless efforts of the secretariat, and the critical leadership of Maura Marx and John Palfrey, the DPLA has gone from the drawing board to an impending beta launch in April 2013.

As I was tweeting from the Chicago meeting, distant respondents asked what the DPLA is actually going to be. What follows is what I see as some of its key initial elements, though it will undoubtedly grow substantially. (One worry expressed by many in Chicago was that the website launch in April will be seen as the totality of the DPLA, rather than a promising starting point.)

The primary theme in Chicago is the double-entendre subtitle of this post: coming together. It was clear to everyone at the meeting that the project was reaching fruition, garnering essential support from public funders such as the National Endowment for the Humanities and the Institute of Museum and Library Services, and private foundations such as Sloan, Arcadia, and (most recently) Knight. Just as clear was the idea that what distinguishes the DPLA from—and means it will be complementary to—other libraries (online and off) is its potent combination of local and national efforts, and digital and physical footprints.

Ponds->Lakes->Ocean

The foundation of the DPLA will be a huge store of metadata (and potentially thumbnails), culled from hundreds of sources across America. A large part of the initial collection will come from recently freed metadata about books, videos, audio recordings, images, manuscripts, and maps from large institutions like Harvard, provided under the couldn’t-be-more-permissive CC0 license. Wisely, in my estimation (perhaps colored by the fact that I’m a historian), the DPLA has sought out local archival content that has been digitized but is languishing in places that cannot solicit a large audience, and that do not have the know-how to enable modern web services such as APIs.

As I put it on Twitter, one can think of this initial set of materials (beyond the millions of metadata records from universities) as content from local ponds—small libraries, archives, museums, and historic sites—sent through streams to lakes—state digital libraries, which already exist in 40 states (a surprise to many, I suspect)—and then through rivers to the ocean—the DPLA. The DPLA will run a sophisticated technical infrastructure that will support manifold uses of this aggregation of aggregations.

Plan Nationally, Scan Locally

Since the Roy Rosenzweig Center for History and New Media has worked with many local archives, museums, and historic sites, especially through our Omeka project (which has been selected as the software to run online exhibits for the DPLA), I was aware of the great cultural heritage materials that are out there in this country. The DPLA is right: much of this incredible content is effectively invisible, failing to reach national and international audiences. The DPLA will bring huge new traffic to local scanning efforts. Funding agencies such as the Institute of Museum and Library Services have already provided the resources to scan numerous items at the local level; as IMLS Director Susan Hildreth pointed out, their grant to the DPLA meant that they could bring that already-scanned content to the world—a multiplier effect.

In Chicago we discussed ways of gathering additional local content. My thought was that local libraries can brand a designated computer workstation with the blue DPLA banner, with a scanner and a nice screen showing the cultural riches of the community in slideshow mode. Directions and help will be available to scan in new documents from personal or community collections.

[My very quick mockup of a public library DPLA workstation; underlying Creative Commons photo by Flickr user JennieB]

Others envisioned “Antiques Roadshow”-type events, and Emily Gore, Director of Content at the DPLA, who coined the great term Scannebagos, spoke of mobile scanning units that could digitize content across the country.

The DPLA is not alone in sensing this great unmet need for public libraries and similar institutions to assist communities in the digital preservation of personal and local history. For instance, Bill LeFurgy, who works at the Library of Congress with the National Digital Information Infrastructure and Preservation Program (NDIIPP), recently wrote:

Cultural heritage organizations have a great opportunity to fulfill their mission through what I loosely refer to as personal digital archiving…Cultural heritage institutions, as preserving entities with a public service orientation, are well-positioned to help people deal with their growing–and fragile–personal digital archives. This is a way for institutions to connect with their communities in a new way, and to thrive.

I couldn’t agree more, and although Bill focused mostly on the born-digital materials that we all have in abundance today, this mission of digital preservation can easily extend back to analog artifacts from our past. As the University of Wisconsin’s Dorothea Salo has put it, let’s turn collection development inside out, from centralized organizations to a distributed model.

When Roy and I wrote Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web, we debated the merits of “preservation through digitization.” While it may be problematic for certain kinds of rare materials, there is no doubt that local and personal collections could use this pathway. Given recent (and likely forthcoming) cuts to local archives, this seems even more meritorious.

The Best of the Digital and the Physical

The core strength, and unique feature, of the DPLA is thus that it will bring together the power and reach of the digital realm with the local community and trust in the thousands of American public libraries, museums, and historical sites—an extremely compelling combination. We are going through a difficult transition from print to digital reading, in which people are buying ebooks they cannot share or pass down to their children. The ephemerality of the digital is likely to become increasingly worrisome in this transition. At the same time people are demanding of their local libraries a greater digital engagement.

Ideally the DPLA can help public libraries and vice versa. With a stable, open DPLA combined with on-the-ground libraries, we can begin to articulate a model that protects and makes accessible our cultural heritage through and beyond the digital transition. For the foreseeable future public libraries will continue to house physical materials—the continued wonders of the codex—as well as provide access to the internet for the still significant minority without such access. And the DPLA can serve as a digital attic and distribution center for those libraries.

The key point, made by DPLA board member Laura DeBonis, is that with this physical footprint in communities the DPLA can do things that Google and other dotcoms cannot. She did not mean this as a criticism of Google Books (a project she was involved with when she worked at Google), which has done impressive work in scanning over 20 million books. But the DPLA has an incredible potential local network it can take advantage of to reach out to millions of people and have them share their history—in general, to democratize the access to knowledge.

It is critical to underline this point: the DPLA will be much more than its technical infrastructure. It will succeed or fail not on its web services but on its ability to connect with localities across the United States and have them use—and contribute—to the DPLA.

A Community-Oriented Platform

Having said that, the technical infrastructure is looking solid. But here, too, the Technical Aspects workstream is keeping foremost in their mind community uses. As workstream member David Weinberger has written, we can imagine a future library as a platform, one that serves communities:

In many instances, those communities will be defined geographically, whether it’s a town’s local library or a university community; in some instances, the community will be defined by interest, not by geography. In either case, serving a defined community has two advantages. First, it enables libraries to accomplish the mission they’ve been funded to accomplish. Second, user networks depend upon and assume local knowledge, interests, and norms. While a local library platform should interoperate with the rest of the world’s library platforms, it may do best if it is distinctively local…

Just as each project created by a developer makes it easier for the next developer to create the next app, each interaction by users ought to make the library platform a little smarter, a little wiser, a little more tuned to its users interests. Further, the visible presence of neighbors and the availability of their work will not only make the library an ever more essential piece of the locality’s infrastructure, it can make the local community itself more coherent and humane.

Conceiving of the library as a platform not only opens a range of new services and provides for a continuous increase in the library’s value, it also does something libraries urgently need to do: it changes the criteria of success. A library platform should be measured less on the circulation of its works than in the circulation of the ideas and passions these works spark — from how many works are checked out to the community’s engagement with its own grappling with those works. This is not only a metric that libraries-as-platforms can excel at, it is in fact a measure of what has always been the truest value of libraries.

In that sense, by becoming a platform the library can better fulfill the abiding mission it set itself: to be a civic institution essential to democracy.

Nicely put.

New Uses for Local History

It’s not hard to imagine many apps and sites incorporating the DPLA’s aggregation of local historical content. It struck me that an easy first step is incorporation of the DPLA into existing public library apps. Here in Fairfax, Virginia, our county has an app that is fairly rudimentary but quickly becoming popular because it replaces that library card you can never find. (The app also can alert you to available holds and new titles, and search the catalog.)

I fired up the Fairfax Library app on my phone at the Chicago meeting, and although the county doesn’t know it yet, there’s already a slot for the DPLA in the app. That “local” tab at the bottom can sense where you are and direct you to nearby physical collections; through the DPLA API it will be trivial to also show people digitized items from their community or current locale.

Granted, Fairfax County is affluent and has a well-capitalized public library system that can afford a smartphone app. But my guess is the app is fairly simple and was probably built from a framework other libraries use (indeed, it may be part of Fairfax County’s ILS vendor package), so DPLA integration could happen with many public libraries in this way. For libraries without such resources, I can imagine local hackfests lending a hand, perhaps working from a base app that can be customized for different public libraries easily.

Long-time readers of this blog can identify dozens of other apps that will be hungry for DPLA content. The idea of marrying geolocation with historical materials has flourished in the last two years, with apps like HistoryPin showing how people can find out about the history around them.

Even Google has gotten into the act of location + history with its recently launched Field Trip app. I suspect countless similar projects will be enhanced by, or based on, the DPLA API.

Moreover, geolocating historical documents is but one way to use the technical infrastructure of the DPLA. As the technical working group has wisely noted, the platform exists for unintended uses as well as obvious ones. To explore the many possibilities, there will next be an “Appfest” at the Chattanooga Public Library on November 8-9, 2012. And I’m planning a DPLA hacking session here at the Roy Rosenzweig Center for History and New Media for December 6, 2012, concurrent with an Audience and Participation workstream meeting. Stay tuned for details.

The Speculative

Only hinted at in Chicago, but worthy of greater thought, is what else we might do with the combination of thousands of public libraries and the DPLA. This area is more speculative, for reasons ranging from legal considerations to the changing nature of reading. The strong fair use arguments that won the day in the Authors Guild v. HathiTrust case (the ruling was handed down the day before DPLA Midwest) may—may— enable new kinds of sharing of digital materials within geofenced areas such as public libraries. (Chicago did not have a report from DPLA’s legal workstream, so we await their understanding of the shifting copyright and fair use landscape in the wake of landmark positive rulings in the HathiTrust and Georgia State cases.)

Perhaps the public library can achieve, in the medium term, some kind of hybrid physical-digital browsability as imagined in this video of a French bookstore from the near future, in which a simple scan of a book using a tablet transfers an e-text to the tablet. The video gets at the ongoing need for in-person reading advice and the superior browsability of physical bookshelves.

I’ve been tracking a number of these speculative exercises, such as the student projects in Harvard Graduate School of Design’s Library Test Kitchen, which experiments with media transformations of libraries. I suspect that bookfuturists will think of other potential physical/digital hybrids.

But we need not get fancy. More obvious benefits abound. The DPLA will be widely used by teachers and students, with scans being placed into syllabi and contextualized by scholars. Judging by the traffic RRCHNM’s educational sites and digital archives get, I expect a huge waiting audience for this. I can also anticipate local groups of readers and historical enthusiasts gathering in person to discuss works from the DPLA.

Momentum, but Much Left to Do

To be sure, many tough challenges still await the DPLA. Largely absent from the discussion in Chicago, with its focus on local history, is the need to see what the digital library can do with books. After all, the majority of circulations from public libraries are popular, in-copyright works, and despite great unique local content the public may expect that P in DPLA to provide a bit more of what they are used to from their local library. Finding ways to have big publishers share at least some books through the system—or perhaps start with smaller publishers willing to experiment with new models of distribution—will be an important piece of the puzzle.

As I noted at the start, the DPLA now has funding from public and private sources, but it will have to raise much, much more, not easy in these austere times. It needs a staff with the energy to match the ambition of the project, and the chops to execute a large digital project that also has in-person connections in 50 states.

A big challenge, indeed. But who wouldn’t like a public, open, digital library that draws from across the United States “to educate, inform and empower everyone”?

October 16, 2012
The Digital Public Library of America: First Things First

Today and tomorrow I’m at the Digital Public Library of America meeting in Washington, DC. I’m a “convener” (I’m hoping that means “judge, jury, and executioner”) of the “Audience and Participation Workstream,” which is trying to assess who will use the DPLA and why. Others are working on technical, legal, financial, and content questions. Questions at today’s small meeting of conveners loomed large in all of those areas: the DPLA may or may not have in-copyright materials, it may or may not be an meta-platform or a centralized resource, it may focus on popular content or the long tail. Obviously these are all questions that will have to be resolved over the next 18 months.

But at today’s meeting I kept coming back to a more basic question, a question faced by any new website or digital project: Why would anyone use it? For something as ambitious (and potentially as expensive) as the DPLA, there is the further question: Why would anyone choose to visit the DPLA first, rather than, say, commercial providers like Google or Amazon, or non-profit entities such as the Internet Archive’s Open Library or OCLC’s Worldcat? Or as Ed Summers more succinctly put it last spring: In what way will the DPLA be better than the web?

Because of these critical root questions, I believe the DPLAs faces a huge uphill battle upon launch. Today, I started a list of elements that could help draw an audience to the DPLA—in the same way that public libraries continue to attract huge numbers of patrons. This list represents a shift of my views about the DPLA from the meeting at Harvard in the spring, where I advocated for advanced research modes. (For this reason, I think some of the data-mining DPLA “beta sprint” prototypes are headed in the wrong direction, at least for this initial phase.) I now think that, at least at first, we have to focus on the P in DPLA.

So what are the characteristics of public libraries that we can leverage for the DPLA?

1) Trust. Why would your average reader or researcher go to dp.la rather than google.com? Because people trust their public library enormously; they understand that the library isn’t out to profit from them, but to serve them. The DPLA should capitalize on this, and posters for the DPLA should end up in the entryway of every public library in America.

2) Local and relevant. Just as people visit the local library or historical society to learn more about their town or neighborhood, they should see, when visiting mytown.dp.la, digital collections of local content (old photographs, genealogies, etc) in addition to lists of books, videos, and other global content. Google or Worldcat may direct you to your local library for a copy of a book, but they don’t curate and present true local content.

3) Fully open and hopefully fully free (at least to the reader), or at least less expensive for popular materials. If by some miracle the legal workstream is able to acquire digital copies of popular books from large publishers, in a way that works better than the maddening Overdrive (where the one digital copy of a book you want is always checked out), then that would be a major extension of a traditional advantage of the public library into the digital age.

4) Easier. Starting research on most topics on the web is still maddening. Bing‘s launch marketing campaign against Google (“you can’t find anything”) was onto something. Can the web presence for the DPLA somehow replicate (or act as a middleman for) the experience of asking a trusted, knowledgeable librarian for help, and direct students, curious people, and serious researchers to an array of materials that help them better than a Google search?

I’m likely missing other initial “magnets,” and am happy to take other suggestions in the comments below. But in short, it seems to me that for the DPLA to be the first choice on the web, it has to take maximal advantage of trust, relevance, and ease versus the general (and mostly commercial) web.

October 20, 2011