Categories
Books Google Text Mining

Initial Thoughts on the Google Books Ngram Viewer and Datasets

First and foremost, you have to be the most jaded or cynical scholar not to be excited by the release of the Google Books Ngram Viewer and (perhaps even more exciting for the geeks among us) the associated datasets. In the same way that the main Google Books site has introduced many scholars to the potential of digital collections on the web, Google Ngrams will introduce many scholars to the possibilities of digital research. There are precious few easy-to-use tools that allow one to explore text-mining patterns and anomalies; perhaps only Wordle has the same dead-simple, addictive quality as Google Ngrams. Digital humanities needs gateway drugs. Kudos to the pushers on the Google Books team.

Second, on the concurrent launch of “Culturomics“: Naming new fields is always contentious, as is declaring precedence. Yes, it was slightly annoying to have the Harvard/MIT scholars behind this coinage and the article that launched it, Michel et al., stake out supposedly new ground without making sufficient reference to prior work and even (ahem) some vaguely familiar, if simpler, graphs and intellectual justifications. Yes, “Culturomics” sounds like an 80s new wave band. If we’re going to coin neologisms, let’s at least go with Sean Gillies’ satirical alternative: Freakumanities. No, there were no humanities scholars in sight in the Culturomics article. But I’m also sure that longtime “humanities computing” scholars consider advocates of “digital humanities” like me Johnnies-come-lately. Luckily, digital humanities is nice, and so let us all welcome Michel et al. to the fold, applaud their work, and do what we can to learn from their clever formulations. (But c’mon, Cantabs, at least return the favor by following some people on Twitter.)

Third, on the quality and utility of the data: To be sure, there are issues. Some big ones. Mark Davies makes some excellent points about why his Corpus of Historical American English (COHA) might be a better choice for researchers, including more nuanced search options and better variety and normalization of the data. Natalie Binder asks some tough questions about Google’s OCR. On Twitter many of us were finding serious problems with the long “s” before 1800 (Danny Sullivan got straight to the naughty point with his discourse on the history of the f-bomb). But the Freakumanities, er, Culturomics guys themselves talk about this problem in their caveats, as does Google.

Moreover, the data will improve. The Google n-grams are already over a year old, and the plan is to release new data as soon as it can be compiled. In addition, unlike text-mining tools like COHA, Google Ngrams is multilingual. For the first time, historians working on Chinese, French, German, and Spanish sources can do what many of us have been doing for some time. Professors love to look a gift horse in the mouth. But let’s also ride the horse and see where it takes us.

So where does it take us? My initial tests on the viewer and examination of the datasets—which, unlike the public site, allow you to count words not only by overall instances but, critically, by number of pages those instances appear on and number of works they appear in—hint at much work to be done:

1) The best possibilities for deeper humanities research are likely in the longer n-grams, not in the unigrams. While everyone obsesses about individuals words (guilty here too of unigramism) or about proper names (which are generally bigrams), more elaborate and interesting interpretations are likelier in the 4- and 5-grams since they begin to provide some context. For instance, if you want to look at the history of marriage, charting the word itself is far less interesting than seeing if it co-occurs with words like “loving” or “arranged.” (This is something we learned in working on our NEH-funded grant on text mining for historians.)

2) We should remember that some of the best uses of Google’s n-grams will come from using this data along with other data. My gripe with the “Culturomics” name was that it implied (from “genomics”) that some single massive dataset, like the human genome, will be the be-all and end-all for cultural research. But much of the best digital humanities work has come from mashing up data from different domains. Creative scholars will find ways to use the Google n-grams in concert with other datasets from cultural heritage collections.

3) Despite my occasional griping about the Culturomists, they did some rather clever things with statistics in the latter part of their article to tease out cultural trends. We historians and humanists should be looking carefully at the more complex formulations of Michel et al., when they move beyond linguistics and unigram patterns to investigate in shrewd ways topics like how fleeting fame is and whether the suppression of authors by totalitarian regimes works. Good stuff.

4) For me, the biggest problem with the viewer and the data is that you cannot seamlessly move from distant reading to close reading, from the bird’s eye view to the actual texts. Historical trends often need to be investigated in detail (another lesson from our NEH grant), and it’s not entirely clear if you move from Ngram Viewer to the main Google Books interface that you’ll get the book scans the data represents. That’s why I have my students use Mark Davies’ Time Magazine Corpus when we begin to study historical text mining—they can easily look at specific magazine articles when they need to.

How do you plan to use the Google Books Ngram Viewer and its associated data? I would love to hear your ideas for smart work in history and the humanities in the comments, and will update this post with my own further thoughts as they occur to me.

Categories
Books Text Mining

New York Times Covers Victorian Books Project

Patricia Cohen of the New York Times has been working on an excellent series on digital humanities, and her second article focuses on our text mining work on Victorian books, which was directly enabled by a grant from Google and more broadly enabled by a previous grant from the National Endowment for the Humanities to explore text mining in history. I’m glad Cohen (no relation) captured the nuances and caveats as well as the potential of digital methods. I also liked how the graphics department did a great job converting and explaining some of our graphs.

I previously posted a rough transcript of my talk on Victorian history and literature that Cohen mentions in the piece. She also covered my work earlier this year in an article on peer review that was much debated in academia.

Categories
Academia Books Collaboration Conferences and Workshops Programming Unconferences

Thoughts on One Week | One Tool

Well that just happened. It’s hard to believe that last Sunday twelve scholars and software developers were arriving at the brand-new Mason Inn on our campus and now have created and launched a tool, Anthologize, that created a frenzy on social and mass media.

If you haven’t already done so, you should first read the many excellent reports from those who participated in One Week | One Tool (and watched it from afar). One Week | One Tool was an intense institute sponsored by the National Endowment for the Humanities that strove to convey the Center for History and New Media‘s knowledge about building useful scholarly software. As the name suggests, the participants had to conceive, build, and disseminate their own tool in just one week. To the participants’ tired voices I add a few thoughts from the aftermath.

Less Talk, More Grok

One Week director (and Center for History and New Media managing director) Tom Scheinfeldt and I grew up listening to WAAF in Boston, which had the motto (generally yelled, with reverb) “Less Talk, More Rock!” (This being Boston, it was actually more like “Rahwk!”) For THATCamp I spun that call-to-action into “Less Talk, More Grok!” since it seemed to me that the core of THATCamp is its antagonism toward the deadening lectures and panels of normal academic conferences and its attempt to maximize knowledge transfer with nonhierarchical, highly participatory, hands-on work. THATCamp is exhausting and exhilarating because everyone is engaged and has something to bring to the table.

Not to over-philosophize or over-idealize THATCamp, but for academic doubters I do think the unconference is making an argument about understanding that should be familiar to many humanists: the importance of “tacit knowledge.” For instance, in my field, the history of science, scholars have come to realize in the last few decades that not all of science consists of cerebral equations and concepts that can be taught in a textbook; often science involves techniques and experiential lessons that must be acquired in a hands-on way from someone already capable in that realm.

This is also true for the digital humanities. I joked with emissaries from the National Endowment for the Humanities, which took a huge risk in funding One Week, that our proposal to them was like Jerry Seinfeld’s and George Costanza’s pitch to NBC for a “show about nothing.” I’m sure it was hard for reviewers of our proposal to see its slightly sketchy syllabus. (“You don’t know what will be built ahead of time?!”) But this is the way in which the digital humanities is close to the lab sciences. There can of course be theory and discussion, but there will also have to be a lot of doing if you want to impart full knowledge of the subject. Many times during the week I saw participants and CHNMers convey things to each other—everything from little shortcuts to substantive lessons—that wouldn’t have occurred to us ahead of time, without the team being engaged in actually building something.

MTV Cops

The low point of One Week was undoubtedly my ham-fisted attempt at something of a keynote while the power was out on campus, killing the lights, the internet, and (most seriously) the air conditioning. Following “Less Talk, More Grok,” I never should have done it. But one story I told at the beginning did seem to have modest continuing impact over the week (if frequently as the source of jokes).

Hollywood is famous for great (and laughable) idea pitches—which is why that Seinfeld episode was amusing—but none is perhaps better than Brandon Tartikoff’s brilliantly concise pitch for Miami Vice: “MTV cops.” I’m a firm believer that it’s important to be able to explain a digital tool with something close to the precision of “MTV cops” if you want a significant number of people to use it. Some might object that we academics are smart folks, capable of understanding sophisticated, multivalent tools, but people are busy, and with digital tools there are so many clamoring for attention and each entails a huge commitment (often putting your scholarship into an entirely new system). Scholars, like everyone else, are thus enormously resistant to tools that are hard to grasp. (Case in point: Google Wave.)

I loved the 24 hours of One Week from Monday afternoon to Tuesday afternoon where the group brainstormed potential tools to build and then narrowed them down to “MTV Cops” soundbites. Of course the tools were going to be more complex than these reductionistic soundbites, but those soundbites gave the process some focus and clarity. It also allowed us to ask Twitter followers to vote on general areas of interest (e.g., “Better timelines”) to gauge the market. We tweeted “Blog->Book” for idea #1, which is what became Anthologize.

And what were most of the headlines on launch day? Some variant on the crystal-clear ReadWriteWeb headline: “Scholars Build Blog-to-eBook Tool in One Week.”

Speed Doesn’t Kill

We’ve gotten occasional flak at the Center for History and New Media for some recent efforts that seem more carnival than Ivory Tower, because they seem to throw out the academic emphasis on considered deliberation. (However, it should be noted that we also do many multi-year, sweat-and-tears, time-consuming projects like the National History Education Clearinghouse, putting online the first fifteen years of American history, and creating software used by millions of people.)

But the experience of events like One Week makes me question whether the academic default to deliberation is truly wise. One Weekers could have sat around for a week, a month, a year, and still I suspect that the tool they decided to build was the best choice, with the greatest potential impact. As programmers in the real world know, it’s much better to have partial, working code than to plan everything out in advance. Just by launching Anthologize in alpha and generating all that excitement, the team opened up tremendous reserves of good will, creativity, and problem-solving from users and outside developers. I saw at least ten great new use cases for Anthologize on Twitter in the first day. How are you supposed to come up with those ideas from internal deliberation or extensive planning?

There was also something special about the 24/7 focus the group achieved. The notion that they had to have a tool in one week (crazy on the face of it) demanded that the participants think about that tool all of the time (even in their sleep, evidently). I’ll bet there was the equivalent of several months worth of thought that went on during One Week, and the time limit meant that participants didn’t have the luxury of overthinking certain choices that were, at the end of the day, either not that important or equally good options. Eric Johnson, observing One Week on Twitter, called this the power of intense “singular worlds” to get things done. Paul Graham has similarly noted the importance of environments that keep one idea foremost in your mind.

There are probably many other areas where focus, limits, and, yes, speed might help us in academia. Dissertations, for instance, often unhealthily drag on as doctoral students unwisely aim for perfection, or feel they have to write 300 pages even though their breakthrough thesis is contained in a single chapter. I wonder if a targeted writing blitz like the successful National Novel Writing Month might be ported to the academy.

Start Small, Dream Big

As dissertations become books through a process of polish and further thought, so should digital tools iterate toward perfection from humble beginnings. I’ve written in this space about the Center for History and New Media’s love of Voltaire’s dictum that “the perfect is the enemy of the good [enough],” and we communicated to One Week attendees that it was fine to start with a tool that was doable in a week. The only caveat was that tool should be conceived with such modularity and flexibility that it could grow into something very powerful. The Anthologize launch reminds me of what I said in this space about Zotero on its launch: it was modest, but it had ambition. It was conceived not just as a reference manager but as an extensible platform for research. The few early negative comments about Anthologize similarly misinterpreted it myopically as a PDF-formatter for blogs. Sure, it will do that, as can other services. But like Zotero (and Omeka) Anthologize is a platform that can be broadly extended and repurposed. Most people thankfully got that—it sparked the imagination of many, even though it’s currently just a rough-around-the-edges alpha.

Congrats again to the whole One Week team. Go get some rest.

Categories
Books Crowdsourcing

Crowdsourcing the Title of My Next Book

Already put this out on Twitter but will reblog here:

I’m crowdsourcing the title of my next book, which is about the way in which common web tech/methods should influence academia, rather than academia thinking it can impose its methods and genres on the web. The title should be a couplet like “The X and the Y” where X can be “Highbrow Humanities” “Elite Academia” “The Ivory Tower” “Deep/High Thought” [insert your idea] and Y can be “Lowbrow Web” “Common Web” “Vernacular Technology/Web” “Public Web” [insert your idea]. so possible titles are “The Highbrow Humanities and the Lowbrow Web” or “The Ivory Tower and the Wild Web” etc. What’s your choice? Thanks in advance for the help and suggestions.

Categories
Blogs Books Software Tools

Introducing Anthologize

A long-running theme of this blog has been the perceived gulf between new forms of online scholarship—including the genre of the blog itself—and traditional forms such as the book and journal. I’m obviously delighted, then, about the outcome of One Week | One Tool, a week-long institute funded by the National Endowment for the Humanities and run by the Center for History and New Media at George Mason University. As the name suggests, twelve humanities scholars with technical chops hunkered down for one week to produce a digital tool they thought could have an impact in the humanities and beyond.

Today marks the launch of this effort: Anthologize, software that converts the popular open-source WordPress system into a full-fledged book-production platform. Using Anthologize, you can take online content such as blogs, feeds, and images (and soon multimedia), and organize it, edit it, and export it into a variety of modern formats that will work on multiple devices. Have a poetry blog? Anthologize it into a nice-looking ePub ebook and distribute it to iPads the world over. A museum with an RSS feed of the best items from your collection? Anthologize it into a coffee table book. Have a group blog on a historical subject? Anthologize the best pieces quarterly into a print or e-journal, or archive it in TEI. Get all the delicious details on the newly revealed Anthologize website.

Anthologize is free and open source software. Obviously in one week it’s impossible to have feature-complete, polished software. There will be a few rough edges. But it works right now (see below) and it’s just the start of a major effort. The grant from NEH anticipates more work for the One Week team over the next year to refine the tool, culminating in a follow-up meeting at THATCamp 2011.

I suspect there will be many users and uses for Anthologize, and developers can extend the software to work in different environments and for different purposes. I see the tool as part of a wave of “reading 2.0” software that I’ve come to rely on for packaging online content for long-form consumption and distribution, including the Readability browser plugin and Instapaper. This class of software is particularly important for the humanities, which remains very bookish, but it is broadly applicable. Anthologize is flexible enough to handle different genres of writing and content, opening up new possibilities for scholarly communication. Personally, I plan to use Anthologize to run a journal and to edit and write two upcoming books.

Credit for Anthologize goes to the amazing team that produced it: Jason Casden, Boone Gorges, Kathie Gossett, Scott Hanrath, Effie Kapsalis, Doug Knox, Zachary McCune, Julie Meloni, Patrick Murray-John, Steve Ramsay, Patrick Rashleigh, and Jana Remy. It is notable that the One Weekers ranged from a recent college grad to tenured professors, programmers and designers and interface experts who also are humanities scholars, and professionals from libraries, museums, and instructional technology. Remarkably, they first met last Sunday night and had production-ready code by Saturday morning, a website to market and support the software, an outreach plan, and a vision for the future of the software beyond its original state. Not to mention a logo to go on nice-looking swag (personally, I’ll take the book bag).

Credit also goes to the great Center for History and New Media team that instructed and supported the One Weekers in the ways we like to conceive, design, and build digital humanities tools: Sharon Leon, Jeremy Boggs, Sheila Brennan, Trevor Owens, and many others who dropped in to help out. Two huge final credits: one to Tom Scheinfeldt for conceiving and running the structured madness that was One Week | One Tool, and the National Endowment for the Humanities, which took a big risk on a very untraditional institute. We hope they, and others, like the idea and the execution of Anthologize.

And just to give you some idea of what Anthologize can do, here’s the Anthologize ePub version of this blog post on an iPad, created in five minutes:

Categories
Academia Books Hacking Publishing Scholarly Communication

One Week, One Book: Hacking the Academy

[Reblogged from the THATCamp website. Please note that you don’t need to be a THATCamper to participate. We are soliciting submissions from everyone, worldwide. Join us by writing something in the next week, or if you’ve already written something you think deserves to be included, let us know!]

Tom Scheinfeldt and I have been brewing a proposal for an edited book entitled Hacking the Academy. Let’s write it together, starting at THATCamp this weekend. And let’s do it in one week.

Can an algorithm edit a journal? Can a library exist without books? Can students build and manage their own learning management platforms? Can a conference be held without a program? Can Twitter replace a scholarly society?

As recently as the mid-2000s, questions like these would have been unthinkable. But today serious scholars are asking whether the institutions of the academy as they have existed for decades, even centuries, aren’t becoming obsolete. Every aspect of scholarly infrastructure is being questioned, and even more importantly, being <em>hacked</em>. Sympathetic scholars of traditionally disparate disciplines are cancelling their association memberships and building their own networks on Facebook and Twitter. Journals are being compiled automatically from self-published blog posts. Newly-minted Ph.D.’s are foregoing the tenure track for alternative academic careers that blur the lines between research, teaching, and service. Graduate students are looking beyond the categories of the traditional C.V. and building expansive professional identities and popular followings through social media. Educational technologists are “punking” established technology vendors by rolling their own open source infrastructure.

“Hacking the Academy” will both explore and contribute to ongoing efforts to rebuild scholarly infrastructure for a new millenium. Contributors can write on these topics, which will form chapters:

  • Lectures and classrooms
  • Scholarly societies
  • Conferences and meetings
  • Journals
  • Books and monographs
  • Tenure and academic employment
  • Scholarly Identity and the CV
  • Departments and disciplines
  • Educational technology
  • Libraries

In keeping with the spirit of hacking, the book will itself be an exercise in reimagining the edited volume. Any blog post, video response, or other media created for the volume and tweeted (or tagged) with the hashtag #hackacad will be aggregated at hackingtheacademy.org. The best pieces will go into the published volume (we are currently in talks with a publisher to do an open access version of this final volume). The volume will also include responses such as blog comments and tweets to individual pieces. If you’ve already written something that you would like included, that’s fine too, just be sure to tweet or tag it (or email us the link to where it’s posted).

You have until midnight on May 28, 2010. Ready, set, go!

UPDATE: [5/23/10] 48 hours in, we have 65 contributions to the book. There’s a running list of contributions.

Categories
Audience Books Promotion and Tenure Publishing Scholarly Communication

The Social Contract of Scholarly Publishing

When Roy Rosenzweig and I finished writing a full draft of our book Digital History, we sat down at a table and looked at the stack of printouts.

“So, what now?” I said to Roy naively. “Couldn’t we just publish what we have on the web with the click of a button? What value does the gap between this stack and the finished product have? Isn’t it 95% done? What’s the last five percent for?”

We stared at the stack some more.

Roy finally broke the silence, explaining the magic of the last stage of scholarly production between the final draft and the published book: “What happens now is the creation of the social contract between the authors and the readers. We agree to spend considerable time ridding the manuscript of minor errors, and the press spends additional time on other corrections and layout, and readers respond to these signals—a lack of typos, nicely formatted footnotes, a bibliography, specialized fonts, and a high-quality physical presentation—by agreeing to give the book a serious read.”

I have frequently replayed that conversation in my mind, wondering about the constitution of this social contract in scholarly publishing, which is deeply related to questions of academic value and reward.

For the ease of conversation, let’s call the two sides of the social contract of scholarly publishing the supply side and the demand side. The supply side is the creation of scholarly works, including writing, peer review, editing, and the form of publication. The demand side is much more elusive—the mental state of the audience that leads them to “buy” what the supply side has produced. In order for the social contract to work, for engaged reading to happen and for credit to be given to the author (or editor of a scholarly collection), both sides need to be aligned properly.

The social contract of the book is profoundly entrenched and powerful—almost mythological—especially in the humanities. As John Updike put it in his diatribe against the digital (and most humanities scholars and tenure committees would still agree), “The printed, bound and paid-for book was—still is, for the moment—more exacting, more demanding, of its producer and consumer both. It is the site of an encounter, in silence, of two minds, one following in the other’s steps but invited to imagine, to argue, to concur on a level of reflection beyond that of personal encounter, with all its merely social conventions, its merciful padding of blather and mutual forgiveness.”

As academic projects have experimented with the web over the past two decades we have seen intense thinking about the supply side. Robust academic work has been reenvisioned in many ways: as topical portals, interactive maps, deep textual databases, new kinds of presses, primary source collections, and even software. Most of these projects strive to reproduce the magic of the traditional social contract of the book, even as they experiment with form.

The demand side, however, has languished. Far fewer efforts have been made to influence the mental state of the scholarly audience. The unspoken assumption is that the reader is more or less unchangeable in this respect, only able to respond to, and validate, works that have the traditional marks of the social contract: having survived a strong filtering process, near-perfect copyediting, the imprimatur of a press.

We need to work much more on the demand side if we want to move the social contract forward into the digital age. Despite Updike’s ode to the book, there are social conventions surrounding print that are worth challenging. Much of the reputational analysis that occurs in the professional humanities relies on cues beyond the scholarly content itself. The act of scanning a CV is an act fraught with these conventions.

Can we change the views of humanities scholars so that they may accept, as some legal scholars already do, the great blog post as being as influential as the great law review article? Can we get humanities faculty, as many tenured economists already do, to publish more in open access journals? Can we accomplish the humanities equivalent of FiveThirtyEight.com, which provides as good, if not better, in-depth political analysis than most newspapers, earning the grudging respect of journalists and political theorists? Can we get our colleagues to recognize outstanding academic work wherever and however it is published?

I believe that to do so, we may have to think less like humanities scholars and more like social scientists. Behavioral economists know that although the perception of value can come from the intrinsic worth of the good itself (e.g., the quality of a wine, already rather subjective), it is often influenced by many other factors, such as price and packaging (the wine bottle, how the wine is presented for tasting). These elements trigger a reaction based on stereotypes—if it’s expensive and looks well-wrapped, it must be valuable. The book and article have an abundance of these value triggers from generations of use, but we are just beginning to understand equivalent value triggers online—thus the critical importance of web design, and why the logo of a trusted institution or a university press can still matter greatly, even if it appears on a website rather than a book.

Social psychologists have also thought deeply about the potent grip of these idols of our tribe. They are aware of how cultural norms establish and propagate themselves, and tell us how the imposition of limits creates hierarchies of recognition. Thinking in their way, along with the way the web works, one potential solution on the demand side might come not from the scarcity of production, as it did in a print world, but from the scarcity of attention. That is, value will be perceived in any community-accepted process that narrows the seemingly limitless texts to read or websites to view. Curation becomes more important than publication once publication ceases to be limited.

[image credit: Priki]

Categories
Amazon Blogs Books Google

Digital Campus #45 – Wave Hello

If you’ve wondered what an academic trying to podcast while on Google Wave might sound like, you need listen no farther than the latest Digital Campus podcast. In addition to an appraisal of Wave, we cover the FTC ruling on bloggers accepting gifts (such as free books from academic presses), the great Kindle-on-campus experiment, and (of course) another update on the Google Books (un)settlement. Joining Tom, Mills, and me is another new irregular, Lisa Spiro. She’s the intelligent one who’s paying attention rather than muttering while watching Google waves go by. [Subscribe to this podcast.]

Categories
Books Google Libraries Microsoft

Digital Campus #44 – Unsettled

The latest edition of the Digital Campus podcast marks a break from the past. After three years of our small roundtable of Tom, Mills, and yours truly, we pull up a couple of extra seats for our first set of “irregulars,” Amanda French and Jeff McClurken. I think you’ll agree they greatly enliven the podcast and we’re looking forward to having them back on an irregular basis. On the discussion docket was the falling apart of the Google Books settlement, reCAPTCHA, Windows 7, and the future of libraries. [Subscribe to this podcast.]

Categories
Academia Books Open Access Open Source Reviews

Idealism and Pragmatism in the Free Culture Movement

[A review of Gary Hall’s Digitize This Book! The Politics of New Media, or Why We Need Open Access Now (University of Minnesota Press, 2009). Appeared in the May/June 2009 issue of Museum.]

Beginning in the late 1970s with Richard Stallman’s irritation at being unable to inspect or alter the code of software he was using at MIT, and accelerating with 22-year-old Linus Torvalds’s release of the whimsically named Linux operating system and the rise of the World Wide Web in the early 1990s, with its emphasis on openly available, interlinked documents, the free software and open access movements are among the most important developments of our digital age.

These movements can no longer be considered fringe. Two-thirds of all websites run on open source software, and although many academic resources remain closed behind digital gates, the Directory of Open Access Journals reports that nearly 4,000 publications are available to anyone via the Web, a number that grows rapidly each year. In the United States, the National Institutes of Health mandated recently that all articles produced under an NIH grant—a significant percentage of current medical research—must be available for free online.

But if the movement toward shared digital openness seems like a single groundswell, it masks an underlying tension between pragmatism and idealism. If Stallman was a seer and the intellectual justifier of “free software” (“free” meaning “liberated”), it was Torvalds’s focus on the practical as well as a less radical name—“open source”—that convinced tech giant IBM to commit billions of dollars to Linux starting in the late 1990s. Similarly, open access efforts like the science article sharing site arXiv.org have flourished because they provide useful services—including narcisstic ones such as establishing scientific precedent—while furthering idealistic goals. Successful movements need both Stallmans and Torvalds, as uneasily as they may coexist.

Gary Hall’s Digitize This Book! clearly falls more on the idealistic side of today’s open movements than the pragmatic side. Although he acknowledges the importance of practice—and he has practiced open access himself—Hall emphasizes that theory must be primary, since unlike any particular website or technology theory contains the full potential of what digitization might bring. He pursues this idealism by drawing from the critical theory—and the critical posture—of cultural studies, one of the most vociferous antagonists to traditional structures in higher education and politics.

Hall’s book is less accessible than others on the topic because of long stretches involving this cultural theory, with some chapters rife with the often opaque language developed by Jacques Derrida and his disciples. Digitize This Book! gets its name, of course, from Abbie Hoffman’s 1971 hippie classic, Steal This Book, which provided practical advice on a variety of uniformly shady (and often illegal) methods for rebelling against The Man. But Digitize This Book! reads less like a Hoffmanesque handbook for the digital age and more like a throw-off-your-chains political manifesto couched in academic lingo.

Those unaccustomed to the lingo and associated theoretical constructions might find the book offputting, but its impressive intellectual ambition makes Digitize This Book! an important addition to a growing literature on the true significance of digital openness. Hall imagines open access not merely in terms of the goods of universal availability and the greater dissemination of knowledge, but as potentially leading to energetic opposition to the “marketization and managerialization of the university,” that is, the growing approach by administrations to treat universities as businesses rather than as places of learning and free intellectual exchange—a development that has upset many, including well beyond cultural studies departments. Similar worries, of course, cloud cultural heritage institutions such as museums and libraries.

Despite his emphasis on theory, Hall knows that any positive transformation must ultimately come from effective action in addition to advocacy. As Stallman unhappily discovered after starting the Free Software Foundation in 1985 and working for many years on his revolutionary software called GNU, it was Torvalds, a clever tactician and amiable community builder rather than theoretician or firebrand, who helped (along with others of similar disposition) to break open source into the mainstream by finding pathways for his Linux operating system to insinuate itself into institutions and companies that normally might have rejected the mere idea of it out of hand.

Hall does understand this pragmatism, and much to his credit he has real experience with creating open access materials rather than simply thinking about how they might affect the academy. He is a co-founder of the Open Humanities Press, a founder and co-editor of the open access journal Culture Machine, and is director of CSeARCH, an arXiv.org for cultural studies.

Yet Hall sees his efforts as ongoing “experiments,” not the final (digital) word. Indeed, he worries that his compatriots in the open access and open source software movements are congratulating themselves too early, and for accomplishing lesser goals. Yes, open source software has made significant inroads, Hall acknowledges, but it has also been “coopted” by the giants of industry, as the IBM investment shows. (The book would have benefited from a more comprehensive analysis of open source, especially in the Third World, where free software is more radically challenging the IBMs and Microsofts.) Similarly, Hall claims, open access journals are flourishing, but too often these journals merely bring online the structures and strictures of traditional academia.

Here is where Hall’s true radicalism comes to the fore, building toward a conclusion with more expansive aims (and more expansive words, such as “hypercyberdemocracy” and “hyperpolitics”). He believes that open access provides a rare opportunity to completely rethink and remake the university, including its internal and external relationships. Paper journals ratified what and who was important in ways we may not want to replicate online, Hall argues. Even if one disagrees with his (hyper)politics, Hall’s insight that new media forms are often little more than unimaginative digital reproductions of the past, which bring forward old conventions and inequities, seems worthy of consideration.

A wag might note at this point that Digitize This Book! is oddly not itself available as a digital reproduction. (As part of the research for this review, I looked in the shadier parts of the Internet but could not locate a free electronic download of the book, even in the shadows.) Other recent books on the open access movement are available for free online (legally), including James Boyle’s The Public Domain: Enclosing the Commons of the Mind (Yale University Press) and John Willinsky’s The Access Principle: The Case for Open Access to Research and Scholarship (MIT Press). Drawing attention to this disconnect is less a cheap knock against Hall than a recognition that the actualization of open access and its transformative potential are easier said than done.

Assuming things will not change overnight and that few professors, curators, or librarians are ready to move, like Abbie Hoffman, to a commune (though many might applaud the lack of administrators there), the key questions are, How does one take concrete steps toward a system in which open access is the normal mode of publishing? Which structures must be dissolved and which created, and how to convince various stakeholders to make this transition together?

These are the kinds of practical—political—questions that advocates of open access must address. Gary Hall has helpfully provided the academic purveyors of open access much food for thought. Now comes the difficult work of crafting recipes to reach the future he so richly imagines.