News Open Access Research Scholarship Zotero

Zotero and the Internet Archive Join Forces

IA LogoZotero LogoI’m pleased to announce a major alliance between the Zotero project at the Center for History and New Media and the Internet Archive. It’s really a match made in heaven—a project to provide free and open source software and services for scholars joining together with the leading open library. The vision and support of the Andrew W. Mellon Foundation has made this possible, as they have made possible the major expansion of the Zotero project over the last year.

You will hear much more about this alliance in the coming months on this blog, but I wanted to outline five key elements of the project.

1. Exposing and Sharing the “Hidden Archive”

The Zotero-IA alliance will create a “Zotero Commons” into which scholarly materials can be added simply via the Zotero client. Almost every scholar and researcher has documents that they have scanned (some of which are in the public domain), finding aids they have created, or bibliographies on topics of interest. Currently there is no easy way to share these; giving them a central home at the Internet Archive will archive them permanently (before they are lost on personal hard drives) and make them broadly available to others.

We understand that not everyone will be willing to share everything (some may not be willing to share anything, even though almost every university commencement reminds graduates that they are joining a “community of scholars”), but we believe that the Commons will provide a good place for shareable materials to reside. The architectural historian with hundreds of photographs of buildings, the researcher who has scanned in old newspapers, and scholars who wish to publish materials in an open access environment will find this a helpful addition to Zotero and the Internet Archive. Some researchers may of course deposit materials only after finishing, say, a book project; what I have called “secondary scholarly materials” (e.g., bibliographies) will perhaps be more readily shared.

But we hope the second part of the project will further entice scholars to contribute important research materials to the Commons.

2. Searching the Personal Library

Most scholars have not yet figured out how to take full advantage of the digitized riches suddenly available on their computers. Indeed, the abundance of digital documents has actually exacerbated the problems of some researchers, who now find themselves overwhelmed by the sheer quantity of available material. Moreover, the major advantage of digital research—the ability to scan large masses of text quickly—is often unavailable to scholars who have done their own scanning or copying of texts.

A critical second part to this alliance of IA and Zotero is to bring robust and seamless Optical Character Recognition (OCR) to the vast majority of scholars who lack the means or do not know how to convert their scans into searchable text. In addition, this process will let others search through such newly digitized texts. After a submission to the Commons, the Internet Archive will subsequently return an OCRed version of each donated document to enable searchability. This text will be incorporated into the donor’s local index (on the Zotero client) and thus made searchable in Zotero’s powerful quick search and advanced search panes. In short, this process will provide a tremendous incentive for scholars to donate to the Commons, since it will help them with their own research.

3. Enabling Networked References and Annotations

One of the pillars of scholarship is the ability for distributed scholars to be sure they are referencing the same text or evidence. As noted in #1, one of the great advantages of the Zotero Commons at IA will be the transport of scholarly materials currently residing on personal hard drives to a public space with stable, rather than local, addresses. These addresses will become critical as scholars begin to use, refer to, and cite items in the Commons.

Yet the IA/Zotero partnership has another benefit: as scholars begin to use not only traditional primary sources that have been digitized but also “born digital” materials on the web (blogs, online essays, documents transcribed into HTML), the possibility arises for Zotero users to leverage the resources of IA to ensure a more reliable form of scholarly communication. One of the Internet Archive’s great strengths is that it has not only archived the web but also given each page a permanent URI that includes a time and date stamp in addition to the URL.

Currently when a scholar using Zotero wishes to save a web page for their research they simply store a local copy. For some, perhaps many, purposes this is fine. But for web documents that a scholar believes will be important to share, cite, or collaboratively annotate (e.g., among a group of coauthors of an article or book) we will provide a second option in the Zotero web save function to grab a permanent copy and URI from IA’s web archive. A scholar who shares this item in their library can then be sure that all others who choose to use it will be referring to the exact same document.

Moreover, unlike most research software the sophisticated annotation tools built into Zotero—the ability to highlight passages, add virtual Post-It notes, as well as regular notes on the overall document—maintain these annotations separately from the underlying document. This presents the exciting possibility for collaborative scholarly annotation of web pages.

4. Simplifying Collaborative Sharing

Groups of scholars also have the need to create more private “commons,” e.g., for documents that they would like to share in a limited way. In addition to the fully open Zotero Commons we will establish a mechanism for such restricted sharing. Via the Zotero Server, a user will be able to create a special collection with a distinct icon that shows up in the client interface (left column) for every member of the group.

Files added to these collections will be stored on the Internet Archive but will have restricted access. We believe that having these files reside on the IA server will encourage the donation of documents at the end of a collaborative project. The administrator of a shared collection will be able to move its contents into the fully open Zotero Commons via a single click in the administrative interface on the Zotero Server.

5. Facilitating Scholarly Discovery

The multiple libraries of content created by Zotero users and the multi-petabyte digital collections of the Internet Archive are resources that can potentially be of great use to the scholarly community. We believe that neither has experienced the level of exploration and usage we believe is possible through further development and collaboration.

The combined digital collections present opportunities for scholars to find primary research materials, to discover one another’s work, to identify materials that are already available in digital form and therefore do not need to be located and scanned, to find other scholars with similar interests and to share their own insights broadly. We plan to leverage the combined strengths of the Zotero project and the Internet Archive to work on better discovery tools.

55 replies on “Zotero and the Internet Archive Join Forces”

Nice work Dan!

A question, though: exactly how will these URIs work in the context of Zotero? It seems the IA tracks the URI for the document proper, and then uses that in the context of its own URIs for the time-stamped copies.

So am I to understand that Zotero will keep a) the URI as the identifier, and b) a link to the proper time-stamped version archived at the IA?

@Bruce: yes, I believe we will keep the URI (which is where the user is on the web when they ask Zotero to save a copy) but then (if the user specifies) link to a permanent copy at IA (of course, as you note, the IA URI includes the original address). The user will have a choice on this; if a group wants to, say, collaboratively annotate a document, they obviously will all need to be able to point to a stable cache and address. There are some details that need to be worked out on this, but the idea is to enable better citations and collaborative annotations, which are impossible if scholars are pointing to different versions of a web page.

OK, good Dan. That approach gives the best of both worlds probably: integration with the distributed web, and the certainty and stability that can come from centralized archives.

It sounds like it could be a great resource, though I am a bit skeptical. Based on the stated workflow and infrastructure, it appears as if there is the potential for widespread copyright infringement. What are the Commons’ plans for preventing copyright infringement?

This is very good and very amazing news. I’m doing research on European imperialism in China in the 19th century and here in Taiwan not all the material I need is readily available. But thanks to the Internet Archive I now have access to hundreds of amazing original titles. Thanks to Zotero they are all neatly organized, retrievable and searcheable. It’s more than time saving and efficient, it’s funky and fun. It’s like the real era of research just has begun. Everything else before was just warming up exercises.

An article in The New York Times for December 23 ( describes the problems being created in the movie industry by the shift from film to digital recording. Whereas film can be stored (apparently with long time limits) in climate-controlled caves, storing digital movies can cost about 12 times as much, and adding the associated materials (like annotated scripts) can add about 400 times the costs of storing the same materials when they are associated with a conventional film. Added to the problem of costs are those of degradation when digital movies are transferred to film for storage, of superseded (and therefore unavailable) playback devices, and rapid deterioration. This situation in a global industry raises questions about the smaller, but we hope equally important, industry of converting humanities materials to digital formats.

It is some time since we seem to have concerned ourselves about long-term storage, and that discussion seems to have centered on the viability of CDs. Since the major drive to digitize the contents of whole libraries, there has not (to my awareness) been a similar expression of concern about whether the digital versions will outlive the books they are replacing.

Does anyone have information about this serious problem? Is it being considered by the major corporations that are engaged in the digital library initiatives? Where does one learn more about this basic concern of computer-using humanists?

I have used the Internet Archive extensively for a few years, as well as Zotero (albeit only more recently), and I must say that this partnership has amazing potential. The combined power of these research tools is awesome, and I will definitely be making use of this whenever I can.
P.S. reCAPTCHA is a nice project too – glad to see you’re using it… will the OCR scans from Zotero Commons be used for reCAPTCHA too, or will correcting documents be the prerogative of the user?

@Jeff: it’s conceivable that texts donated to the Zotero Commons will in turn end up in Open Library and thus be queued up for use in reCAPTCHA. But initially users will get uncorrected OCR from IA’s OCR servers.

Hi Dan,

While I am not a techie per se (but a budding scholar practitioner) and a community-builder with WikiEducator (, I would like to speak with you about what might be possible in terms of our many WikiEducators using Zotero as a resource and collaboratively sharing resources and references (as they create educational resources and materials; and of course, the potential for strategic collaboration on areas of mutual interest.

– Randy

Dan, Zotero is terrific. Rita Tehan and I have been giving training in the Congressional Research Service, and my work would be so much less efficient without Zotero.

About the Zotero/Internet Archive project: This would be a godsend for genealogists, if it could be made workable for them. I’ve been doing some organizing, digitizing, etc. of my grandfather’s files (and files and files) and am stymied by what to do with all these primary resources that would be so valuable to other genealogists but who would not be served by putting them into a library somewhere. I’ve also found that most resources are available only by paying through commercial genealogical “services” (not much but software), whereas many, many researchers are doing this as a hobby. If you ever want to chat about the problems and how Zotero might help, contact me. (I haven’t thought this through….)

Hi Dan
I think Zotero is really great and enjoy using it.
Is the “Zotero Commons” operational already? If not when do you expect it to be ready?

Leave a Reply