CC0 (+BY)

Those who have heard me talk about the Digital Public Library of America over the past six months know that I’m fond of saying that DPLA is as much a social project as a technical project. Much of what we do focuses on collaboration and coordination, which involves looking not just at technical—or legal—elements, but social ones.

It’s much easier to think of an issue solely as a technical problem (we just need to figure out how to code that properly), or as a legal problem (we just need to bind everyone under a contractual arrangement to achieve the desired outcome), than as a social issue, since the latter requires attention to more amorphous aspects such as ethics and politics. But being more nuanced about the mix of the social, technical, and legal can pay dividends.

Take DPLA’s metadata. (Please. Take our metadata. It’s all freely available on our site.) One of the questions I frequently get is why the Digital Public Library of America requires the metadata for items in our collection to be donated under a CC0 license. That license is maximally permissive; as its longer name implies, CC0 is in fact a Public Domain Dedication.

Metadata obviously has elements of the technical and legal. Without a stringent technical standard into which we normalize data from over a thousand institutions, and a serious digital infrastructure to transform that metadata into interfaces such as maps and timelines, we couldn’t work much magic. And since we are conscious of the legal realm that many cultural heritage materials exist in, we do ask for a contract that specifies CC0 for the metadata. (However, there are many who would argue that even a CC0 license is unnecessary and should not even be demanded; by its very nature, a purely descriptive set of metadata should not be copyrightable (under U.S. law), but this is a discussion for another day.)

But why not ask for the most modest of additional restrictions, such as a license where attribution is required—a license with a -BY attached to the right? If we wish to tip our hat to those who created or donated the metadata, why not legally mandate it?

Those who use, reuse, and commingle data know the complex issues that arise with even simple additional requirements such as this. Data that flows from many sources will pick up, like fallen branches in the stream, a variety of ensnaring reeds, adding significant friction and complexity to some applications. But good-meaning people still want to provide attribution, and individuals and institutions might have social expectations of receiving credit. What to do?

Move the attribution from the legal realm into the social or ethical realm by pairing a permissive license with a strong moral entreaty.

For instance, the Tate recently released metadata for 70,000 works of art, and 3500 artists. The license they put on the data was CC0. But right next to that license is this block on “Usage Guidelines”:

These usage guidelines are based on goodwill. They are not a legal contract but Tate requests that you follow these guidelines if you use Metadata from our Collection dataset.

The Metadata published by Tate is available free of restrictions under the Creative Commons Zero Public Domain Dedication.

This means that you can use it for any purpose without having to give attribution. However, Tate requests that you actively acknowledge and give attribution to Tate wherever possible. Attribution supports future efforts to release other data. It also reduces the amount of ‘orphaned data’, helping retain links to authoritative sources.

As with many other things, our friends from Europeana were out in front on this, as Tate acknowledges on their GitHub page. Here’s Europeana’s metadata use page:

These usage guidelines are based on goodwill, they are not a legal contract but Europeana requests that you follow these guidelines if you use metadata from Europeana.

All metadata published by Europeana are available free of restriction under theCreative Commons CC0 1.0 Universal Public Domain Dedication. However, Europeana requests that you actively acknowledge and give attribution to all metadata sources, such as the data providers (being a specific cultural heritage institution) and any data aggregators, including Europeana.

Give credit where credit is due.

DPLA does the same thing with our Data Best Use Practices page.

I have been calling this implied or ethical attribution. Or, if you like short and snappy symbols, think of it as CC0 (+BY) rather than CC-BY (or ODB-BY).

The cynics, of course, will say that bad actors will do bad things with all that open data. But here’s the thing about the open web: bad actors will do bad things, regardless. They will ignore whatever license you have asserted, or use technical means to circumvent your technical lock. And yes, with CC0 commercial entities as well might come and take all of that metadata—but that data includes pointers back to items and scans at libraries, archives, and museums, which are (or should be) in the business of disseminating knowledge as widely as possible. By being free with our metadata, we do not devalue those nonprofit institutions, but rather emphasize more broadly the incredible contents they hold.

The flip side of worries about bad actors is that we underestimate the number of good actors doing the right thing. It has been our experience looking at the many software developers (including commercial ones) who have used our data across the web and in DPLA-powered apps, for instance, that they have all maintained proper attribution, even though the CC0 license theoretically means that they can do with the data whatever they want.

I think CCO (+BY) is the best of both worlds: the data in a free-flowing environment that enables creativity and reuse, with attribution still maintained by the vast majority of people who consider themselves part of a social contract.

Comments

Eric Hellman says:

I am one of those who “use, reuse, and commingle data”; I’ve been doing it for quite a while. I very well “know the complex issues that arise with even simple additional requirements such as this”.

But I’ve seen for myself that lack of provenance is the black death of metadata. The problems with CC BY for data are real, but they’re not caused by legal or moral shortcomings, they’re caused by flimsy and cumbersome attribution technology. I think we’ll see the emergence over the next few years the sort of convenient, strong attribution/provenance technology for data that we are starting to see effectively deployed by GitHub for code.

So think of your metadata as kittens.
Today our licensing options are:
CC0: “Free kittens. We don’t care if you let them die.”
CC BY: “Free kittens. You’d better take care of them.”
CC0 (+BY): “Free kittens. We’d be sad if you let them die”

Please see discussions about our project on British Library manuscripts from the early 1400s on our G+ Community called France 1400.

[…] been released into the public domain, we would like to direct you to a post by Dan Cohen titled “CC0 (+BY)” [26 November 2013]. There is no obligation for you to attribute anything to us, but we’d […]

[…] while they have been released into the public domain, we would like to direct you to a post by Dan Cohen titled “CC0 (+BY)” There is no obligation for you to attribute anything to us, but we’d appreciate it. The […]

[…] Copyright has long expired on this material, so it is all public domain. And while there is no obligation to attribute the source of public domain material, the Library says it would be nice if people do. […]

[…] views. If you’re interested in the idea of using portals to connect up collections, Dan Cohen, executive Director of the Digital Public Library of America has written a great piece on […]

[…] information that can be appropriately licensed by CC, digital humanists Bethany Nowviskie and Dan Cohen have both written convincing arguments for licensing your work using CC0, aka CC Zero1. Their […]

[…] information that can be appropriately licensed by CC, digital humanists Bethany Nowviskie and Dan Cohen have both written convincing arguments for licensing your work using CC0, aka CC Zero1. Their […]

[…] inhibits re-use (one should always assume that people are intrinsically lazy). There is a post advocating ccZero+ by Dan Cohen. However, impact tracking may mean that the BY clause becomes a default for academic […]

[…] while they have been released into the public domain, we would like to direct you to a post by Dan Cohen titled "CC0 (+BY)" There is no obligation for you to attribute anything to us, but we’d appreciate it. The […]

[…] is interesting to note here that Papers Past’s copyright guide is in line with Dan Cohen’s recent suggestion to “move the attribution from the legal realm into the social or ethical realm by pairing a […]

[…] a broad user community for grant applications to fund project development. But we are convinced by the argument that requests for attribution that are encoded in the legal license are both ineffective and […]

Eric, I think your analogy is close, but I feel that it is more like:

CC BY: “Free kittens. We don’t care if you let them die, but please cite our shop where you bought them.”

In other news, at some point in time I saw an add-on document, independent from the license, that covers the BY part. As a scholar, I don’t care, because it is a faux-pas to not cite the origin. However, even academics seem to require reminder of that, and then such a BY clause is useful. Sadly, I can no longer find the text of that add-on 🙁

[…] a reminder, everything in this blog is CC0+BY, so feel free to reuse any or all of it in your own projects! And I want to thank BIDS, and […]

Leave a Reply