Author: Dan Cohen

Using WordPress as a Book-Writing Platform

I’ve had a few people ask about the writing environment I’m using for The Ivory Tower and the Open Web (introduction posted a couple of days ago). I’m writing the book entirely in WordPress, which really has matured into a terrific authoring platform. Some notes:

1) The addition of the TinyMCE WYSIWYG text-editing tools made WordPress today’s version of the beloved Word 5.1, the lean, mean, writing machine that Word used to be before Microsoft bloated it beyond recognition.

2) WordPress 3.2 joined the distraction-free trend mainstreamed by apps like Scrivener and Instapaper, where computer administrative debris (as Edward Tufte once called the layers of eye-catching controls that frame most application windows) fades away. If you go into full-screen mode in the editor everything disappears but your text. WordPress devs even thoughtfully added a zen “Just write” prompt to get you going. Go full-screen in your browser for extra zen.

3) For footnotes, I’m using the excellent WP-Footnotes plugin, which is not only easy to use but (perhaps critically for the future) degrades gracefully into parenthetical embedded citations outside of WordPress.

4) I’m of course using Zotero to insert and format those footnotes, using one of the features that makes Zotero better (IMHO) than other research managers: the ability to drag and drop formatted citations right from the Zotero interface into a textarea in the browser. (WP-Footnotes handles the automatic numbering.)

5) I’ve done a few tweaks to WordPress’s wp-admin CSS to customize the writing environment (there’s an “editorcontainer” that styles the textarea). In particular, I found the default width too wide for comfortable writing or reading. So I resized it to 500 pixels, which is roughly the line width of a standard book.

The Ivory Tower and the Open Web: Burritos, Browsers, and Books

In the summer of 2007, Nate Silver decided to conduct a rigorous assessment of the inexpensive Mexican restaurants in his neighborhood, Chicago’s Wicker Park. Figuring that others might be interested in the results of his study, and that he might be able to use some feedback from an audience, he took his project online.

Silver had no prior experience in such an endeavor. By day he worked as a statistician and writer at Baseball Prospectus—an innovator, to be sure, having created a clever new standard for empirically measuring the value of players, an advanced form of the “sabermetrics” vividly described by Michael Lewis in Moneyball. ((Nate Silver, “Introducing PECOTA,” in Gary Huckabay, Chris Kahrl, Dave Pease et al., eds., Baseball Prospectus 2003 (Dulles, VA: Brassey’s Publishers, 2003): 507-514. Michael Lewis, Moneyball: The Art of Winning an Unfair Game (New York: W. W. Norton & Company, 2004).)) But Silver had no experience as a food critic, nor as a web developer.

In time, his appetite took care of the former and the open web took care of the latter. Silver knit together a variety of free services as the tapestry for his culinary project. He set up a blog, The Burrito Bracket, using Google’s free Blogger web application. Weekly posts consisted of his visits to local restaurants, and the scores (in jalapeños) he awarded in twelve categories.

Home page of Nate Silver’s Burrito Bracket
Ranking system (upper left quadrant)

Being a sports geek, he organized the posts as a series of contests between two restaurants. Satisfying his urge to replicate March Madness, he modified another free application from Google, generally intended to create financial or data spreadsheets, to produce the “bracket” of the blog’s title.

Google Spreadsheets used to create the competition bracket

Like many of the savviest users of the web, Silver started small and improved the site as he went along. For instance, he had started to keep a photographic record of his restaurant visits and decided to share this documentary evidence. So he enlisted the photo-sharing site Flickr, creating an off-the-rack archive to accompany his textual descriptions and numerical scores. On August 15, 2007, he added a map to the site, geolocating each restaurant as he went along and color-coding the winners and losers.

Flickr photo archive for The Burrito Bracket (flickr.com)
Silver’s Google Map of Chicago’s Wicker Park (shaded in purple) with the location of each Mexican restaurant pinpointed

Even with its do-it-yourself enthusiasm and the allure of carne asada, Silver had trouble attracting an audience. He took to Yelp, a popular site for reviewing restaurants to plug The Burrito Bracket, and even thought about creating a Super Burrito Bracket, to cover all of Chicago. ((Frequently Asked Questions, The Burrito Bracket, http://burritobracket.blogspot.com/2007/07/faq.html)) But eventually he abandoned the site following the climactic “Burrito Bowl I.”

With his web skills improved and a presidential election year approaching, Silver decided to try his mathematical approach on that subject instead—”an opportunity for a sort of Moneyball approach to politics,” as he would later put it. ((http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf)) Initially, and with a nod to his obsession with Mexican food, he posted his empirical analyses of politics under the chili-pepper pseudonym “Poblano,” on the liberal website Daily Kos, which hosts blogs for its engaged readers.

Then, in March 2008, Silver registered his own web domain, with a title that was simultaneously and appropriately mathematical and political: fivethirtyeight.com, a reference to the total number of electors in the United States electoral college. He launched the site with a slight one-paragraph post on a recent poll from South Dakota and a summary of other recent polling from around the nation. As with The Burrito Bracket it was a modest start, but one that was modular and extensible. Silver soon added maps and charts to bolster his text.

FiveThirtyEight two months after launch, in May 2008

Nate Silver’s real name and FiveThiryEight didn’t remain obscure for long. His mathematical modeling of the competition between Barack Obama and Hillary Clinton for the Democratic presidential nomination proved strikingly, almost creepily, accurate. Clear-eyed, well-written, statistically rigorous posts began to be passed from browsers to BlackBerries, from bloggers to political junkies to Beltway insiders. From those wired early subscribers to his site, Silver found an increasingly large audience of those looking for data-driven, deeply researched analysis rather than the conventional reporting that presented political forecasting as more art than science.

FiveThiryEight went from just 800 visitors a day in its first month to a daily audience of 600,000 by October 2008. ((Adam Sternbergh, The Spreadsheet Psychic, New York, Oct 12, 2008, http://nymag.com/news/features/51170/)) On election day, FiveThiryEight received a remarkable 3 
million 
visitors, more than most daily newspapers
. ((http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf))

All of this attention for a site that most media coverage still called, with a hint of deprecation, a “blog,” or “aggregator” of polls, despite Silver’s rather obvious, if latent, journalistic skills. (Indeed, one of his roads not taken had been an offer, straight out of college, to become an assistant at The Washington Post. ((http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf)) ) An article in the Colorado Daily on the emergent genre represented by FiveThirtyEight led with Ken Bickers, professor and chair of the political science department at the University of Colorado, saying that such sites were a new form of “quality blogs” (rather than, evidently, the uniformly second-rate blogs that had previously existed). The article then swerved into much more ominous territory, asking whether reading FiveThirtyEight and similar blogs was potentially dangerous, especially compared to the safe environs of the traditional newspaper. Surely these sites were superficial, and they very well might have a negative effect on their audience:

Mary Coussons-Read, a professor of psychology at CU Denver, says today’s quick turnaround of information helps to make it more compelling.

“Information travels so much more quickly,” she says. “(We expect) instant gratification. If people have a question, they want an answer.”

That real-time quality can bring with it the illusion that it’s possible to perceive a whole reality by accessing various bits of information.

“There’s this immediacy of the transfer of information that leads people to believe they’re seeing everything … and that they have an understanding of the meaning of it all,” she says.

And, Coussons-Read adds, there is pleasure in processing information.

“I sometimes feel like it’s almost a recreational activity and less of an information-gathering activity,” she says.

Is it addiction?

[Michele] Wolf says there is something addicting about all that data.

“I do feel some kind of high getting new information and being able to process it,” she says. “I’m also a rock climber. I think there are some characteristics that are shared. My addiction just happens to be information.”

While there’s no such mental-health diagnosis as political addiction, Jeanne White, chemical dependency counselor at Centennial Peaks Hospital in Louisville, says political information seeking could be considered an addictive process if it reaches an extreme. ((Cindy Sutter, “Hooked on information: Can political news really be addicting?” The Colorado Daily, November 3, 2008, http://www.coloradodaily.com/ci_13105998))

This stereotype of blogs as the locus of “information” rather than knowledge, of “recreation” rather than education, was—and is—a common one, despite the wide variety of blogs, including many with long-form, erudite writing. Perhaps in 2008 such a characterization of FiveThirtyEight was unsurprising given that Silver’s only other credits to date were the Player Empirical Comparison and Optimization Test Algorithm (PECOTA) and The Burrito Bracket. Clearly, however, here was an intelligent researcher who had set his mind on a new topic to write about, with a fresh, insightful approach to the material. All he needed was a way to disseminate his findings. His audience appreciated his extraordinarily clever methods—at heart, academic techniques—for cutting through the mythologies and inadequacies of standard political commentary. All they needed was a web browser to find him.

A few journalists saw past the prevailing bias against non-traditional outlets like FiveThirtyEight. In the spring of 2010, Nate Silver bumped into Gerald Marzorati, the editor of the New York Times Magazine, on a train platform in Boston. They struck up a conversation, which eventually turned into a discussion about how FiveThirtyEight might fit into the universe of the Times, which ultimately recognized the excellence of his work and wanted FiveThirtyEight to enhance their political reporting and commentary. That summer, a little more than two years after he had started FiveThirtyEight, Silver’s “blog” merged into the Times under a licensing deal. ((Nate Silver, “FiveThirtyEight to Partner with New York Times, http://www.fivethirtyeight.com/2010/06/fivethirtyeight-to-partner-with-new.html)) In less time than it takes for most students to earn a journalism degree, Silver had willed himself into writing for one of the world’s premier news outlets, taking a seat in the top tier of political analysis. A radically democratic medium had enabled him to do all of this, without the permission of any gatekeeper.

FiveThirtyEight on the New York Times website, 2010

* * *

The story of Nate Silver and FiveThirtyEight has many important lessons for academia, all stemming from the affordances of the open web. His efforts show the do-it-yourself nature of much of the most innovative work on the web, and how one can iterate toward perfection rather than publishing works in fully polished states. His tale underlines the principle that good is good, and that the web is extraordinarily proficient at finding and disseminating the best work, often through continual, post-publication, recursive review. FiveThirtyEight also shows the power of openness to foster that dissemination and the dialogue between author and audience. Finally, the open web enables and rewards unexpected uses and genres.

Undoubtedly it is true that the path from The Burrito Bracket to The New York Times may only be navigated by an exceptionally capable and smart individual. But the tools for replicating Silver’s work are just as open to anyone, and just as powerful. It was with that belief, and the desire to encourage other academics to take advantage of the open web, that Roy Rosenzweig and I wrote Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web. ((Daniel J. Cohen and Roy Rosenzweig, Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web (University of Pennsylvania Press, 2006).)) We knew that the web, although fifteen years old at the time, was still somewhat alien to many professors, graduate students, and even undergraduates (who might be proficient at texting but know nothing about HTML), and we wanted to make the medium more familiar and approachable.

What we did not anticipate was another kind of resistance to the web, based not on an unfamiliarity with the digital realm or on Luddism but on the remarkable inertia of traditional academic methods and genres—the more subtle and widespread biases that hinder the academy’s adoption of new media. These prejudices are less comical, and more deep-seated, than newspapers’ penchant for tales of internet addiction. This resistance has less to do with the tools of the web and more to do with the web’s culture. It was not enough for us to conclude Digital History by saying how wonderful the openness of the web was; for many academics, this openness was part of the problem, a sign that it might be like “playing tennis with the net down,” as my graduate school mentor worriedly wrote to me. ((http://www.dancohen.org/2010/11/11/frank-turner-on-the-future-of-peer-review/))

In some respects, this opposition to the maximal use of the web is understandable. Almost by definition, academics have gotten to where they are by playing a highly scripted game extremely well. That means understanding and following self-reinforcing rules for success. For instance, in history and the humanities at most universities in the United States, there is a vertically integrated industry of monographs, beginning with the dissertation in graduate school—a proto-monograph—followed by the revisions to that work and the publication of it as a book to get tenure, followed by a second book to reach full professor status. Although we are beginning to see a slight liberalization of rules surrounding dissertations—in some places dissertations could be a series of essays or have digital components—graduate students infer that they would best be served on the job market by a traditional, analog monograph.

We thus find ourselves in a situation, now more than two decades into the era of the web, where the use of the medium in academia is modest, at best. Most academic journals have moved online but simply mimic their print editions, providing PDF facsimiles for download and having none of the functionality common to websites, such as venues for discussion. They are also largely gated, resistant not only to access by the general public but also to the coin of the web realm: the link. Similarly, when the Association of American University Presses recently asked its members about their digital publishing strategies, the presses tellingly remained steadfast in their fixation on the monograph. All of the top responses were about print-on-demand and the electronic distribution and discovery of their list, with a mere footnote for a smattering of efforts to host “databases, wikis, or blogs.” ((Association of American University Presses, “Digital Publishing in the AAUP Community; Survey Report: Winter 2009-2010,” http://aaupnet.org/resources/reports/0910digitalsurvey.pdf, p. 2)) In other words, the AAUP members see themselves almost exclusively as book publishers, not as publishers of academic work in whatever form that may take. Surveys of faculty show comfort with decades-old software like word processors but an aversion to recent digital tools and methods. ((See, for example, Robert B. Townsend, “How Is New Media Reshaping the Work of Historians?”, Perspectives on History, November 2010, http://www.historians.org/Perspectives/issues/2010/1011/1011pro2.cfm)) The professoriate may be more liberal politically than the most latte-filled ZIP code in San Francisco, but we are an extraordinarily conservative bunch when in comes to the progression and presentation of our own work. We have done far less than we should have by this point in imagining and enacting what academic work and communication might look like if it was digital first.

To be sure, as William Gibson has famously proclaimed, “The future is already here—it’s just not very evenly distributed.” ((National Public Radio, “Talk of the Nation” radio program, 30 November 1999, timecode 11:55, http://discover.npr.org/features/feature.jhtml?wfId=1067220)) Almost immediately following the advent of the web, which came out of the realm of physics, physicists began using the Los Alamos National Laboratory preprint server (later renamed ArXiv and moved to arXiv.org) to distribute scholarship directly to each other. Blogging has taken hold in some precincts of the academy, such as law and economics, and many in those disciplines rely on web-only outlets such as the Social Science Research Network. The future has had more trouble reaching the humanities, and perhaps this book is aimed slightly more at that side of campus than the science quad. But even among the early adopters, a conservatism reigns. For instance, one of the most prominent academic bloggers, the economist Tyler Cowen, still recommends to students a very traditional path for their own work. ((“Tyler Cowen: Academic Publishing,” remarks at the Institute for Humane Studies Summer Research Fellowship weekend seminar, May 2011, http://vimeo.com/24124436)) And far from being preferred by a large majority of faculty, quests to open scholarship to the general public often meet with skepticism. ((Open access mandates have been tough sells on many campuses, passing only by slight majorities or failing entirely. For instance, such a mandate was voted down at the University of Maryland, with evidence of confusion and ambivalence. http://scholarlykitchen.sspnet.org/2009/04/28/umaryland-faculty-vote-no-oa/))

If Digital History was about the mechanisms for moving academic work online, this book is about how the digital-first culture of the web might become more widespread and acceptable to the professoriate and their students. It is, by necessity, slightly more polemical than Digital History, since it takes direct aim at the conservatism of the academy that twenty years of the web have laid bare. But the web and the academy are not doomed to an inevitable clash of cultures. Viewed properly, the open web is perfectly in line with the fundamental academic goals of research, sharing of knowledge, and meritocracy. This book—and it is a book rather than a blog or stream of tweets because pragmatically that is the best way to reach its intended audience of the hesitant rather than preaching to the online choir—looks at several core academic values and asks how we can best pursue them in a digital age.

First, it points to the critical academic ability to look at any genre without bias and asks whether we might be violating that principle with respect to the web. Upon reflection many of the best things we discover in scholarship are found by disregarding popularity and packaging, by approaching creative works without prejudice. We wouldn’t think much of the meandering novel Moby-Dick if Carl Van Doren hadn’t looked past decades of mixed reviews to find the genius in Melville’s writing. Art historians have similarly unearthed talented artists who did their work outside of the royal academies and the prominent schools of practice. As the unpretentious wine writer Alexis Lichine shrewdly said in the face of fancy labels and appeals to mythical “terroir”: “There is no substitute for pulling corks.” ((Quoted in Frank J. Prial, “Wine Talk,” New York Times, 17 August 1994, http://www.nytimes.com/1994/08/17/garden/wine-talk-983519.html.))

Good is good, no matter the venue of publication or what the crowd thinks. Scholars surely understand that on a deep level, yet many persist in the valuing venue and medium over the content itself. This is especially true at crucial moments, such as promotion and tenure. Surely we can reorient ourselves to our true core value—to honor creativity and quality—which will still guide us to many traditionally published works but will also allow us to consider works in some nontraditional venues such as new open access journals or articles written and posted on a personal website or institutional repository, or digital projects.

The genre of the blog has been especially cursed by this lack of open-mindedness from the academy. Chapter 1, “What is a Blog?”, looks at the history of the blog and blogging, the anatomy and culture of a genre that is in many ways most representative of the open web. Saddled with an early characterization as being the locus of inane, narcissistic writing, the blog has had trouble making real inroads in academia, even though it is an extraordinarily flexible form and the perfect venue for a great deal of academic work. The chapter highlights some of the best examples of academic blogging and how they shape and advance arguments in a field. We can be more creative in thinking about the role of the blog within the academy, as a venue for communicating our work to colleagues as well as to a lay audience beyond the ivory tower.

This academic prejudice against the blog extends to other genres that have proliferated on the open web. Chapter 2, “Genres and the Open Web,” examines the incredible variety of those new forms, and how, with a careful eye, we might be able to import some of them profitably into the academy. Some of these genres, like the wiki, are well-known (thanks to Wikipedia, which academics have come to accept begrudgingly in the last five years). Other genres are rarer but take maximal advantage of the latitude of the open web: its malleability and interactivity. Rather than imposing the genres we know on the web—as we do when we post PDFs of print-first journal articles—we would do well to understand and adopt the web’s native genres, where helpful to scholarly pursuits.

But what of our academic interest in validity and excellence, enshrined in our peer review system? Chapter 3, “Good is Good,” examines the fundamental requirements of any such system: the necessity of highlighting only a minority of the total scholarly output, based on community standards, and of disseminating that minority of work to communities of thought and practice. The chapter compares print-age forms of vetting with native web forms of assessment and review, and proposes ways that digital methods can supplement—or even replace—our traditional modes of peer review.

“The Value, and Values, of Openness,” Chapter 4, broadly examines the nature of the web’s openness. Oddly, this openness is both the easiest trait of the web to understand and its most complex, once one begins to dig deeper. The web’s radical openness not only has led to calls for open access to academic work, which has complicated the traditional models of scholarly publishers and societies; it has also challenged our academic predisposition toward perfectionism—the desire to only publish in a “final” format, purged (as much as possible) of error. Critically, openness has also engendered unexpected uses of online materials—for instance, when Nate Silver refactored poll numbers from the raw data polling agencies posted.

Ultimately, openness is at the core of any academic model that can operate effectively on the web: it provides a way to disseminate our work easily, to assess what has been published, and to point to what’s good and valuable. Openness can naturally lead—indeed, is leading—to a fully functional shadow academic system for scholarly research and communication that exists beyond the more restrictive and inflexible structures of the past.

Introducing PressForward

For some time here at the Roy Rosenzweig Center for History and New Media we have been thinking about the state of scholarly publishing, and its increasing disconnect with how we have come to communicate online. Among our concerns:

• A variety of scholarly work is flourishing online, ranging from long-form writing on blogs, to “gray literature” such as conference papers, to well-curated corpora or data sets, to entirely novel formats enabled by the web

• This scholarship is decentralized, thriving on personal and institutional sites, as well as the open web, but could use some way to receive attention from scholarly communities so works can receive credit and influence others

• The existing scholarly publishing infrastructure has been slow-moving in accounting for this growing and multifaceted realm of online scholarship

• Too much academic publishing remains inert—publication-as-broadcast rather than taking advantage of the web’s peer-to-peer interactivity

• Too much scholarship remains gated when it could be open

Legacy formats like the journal of course have considerable merit, and they are rightly valued: they act as critical, if sometimes imperfect, arbiters of the good and important. At the same time, the web has found ways to filter the abundance of online work, ranging from the tech world (Techmeme) to long-form posts (The Browser), which act as screening agents for those interested in an area of thought or practice.

What if we could combine the best of the scholarly review process with the best of open-web filters? What if we had a scholarly communication system that was digital first?

Today we’re announcing a new initiative to do just that: PressForward, generously supported by a $862,000 grant from the Alfred P. Sloan Foundation‘s Digital Information Technology program.

PressForward will bring together the best scholarship from across the web, producing vital, open publications scholarly communities can gather around. PressForward will:

Develop effective methods for collecting, screening, and drawing attention to the best online scholarship, including scholarly blogs, digital projects, and other web genres that don’t fit into traditional articles or books, as well as conference papers, white papers, and reports

Encourage the proliferation of open access scholarship through active new forms of publication, concentrating the attention of scholarly communities around high-quality, digital-first scholarship

Create a new platform that will make it simple for any organization or community of scholars to launch similar publications and give guidance to institutions, scholarly societies, and academic publishers who wish to supplement their current journals with online outlets

We hope you’ll join us making this new form of scholarly communication a reality. You may be a researcher in a field that is underserved by traditional outlets, because it is new, interdisciplinary, or involves non-textual media. Perhaps you have a digital project that can only be “published” if you describe it in an article. You may be an editor of a journal who would like to supplement standard articles with digital content from across the web, or a scholarly society that wants to find and feature online work. As PressForward evolves, we hope to serve all of these constituencies, as well as a broad audience currently locked out of gated scholarship.

Learn more about PressForward on our new site, or by sending us an email. You can also follow us on Twitter or via RSS.

 

The Roy Rosenzweig Center for History and New Media

On April 15, 2011, the Center for History and New Media became the Roy Rosenzweig Center for History and New Media. This was made possible by the incredible generosity of hundreds of donors, who gave over a million dollars to rename the center. I’m enormously grateful to those contributors, many of whom read this blog. Thank you. It was especially touching that in addition to the tremendous donations from Roy’s friends, there were scores of donations from people who had never met Roy, such as a $5 contribution from a student in rural India who learned from our websites and tools.

Here’s what I said at the beginning of the dedication ceremony:

On behalf of the entire staff of the center and also for the Department of History and Art History that the center is a part of, let me welcome you on this wonderful occasion to the Roy Rosenzweig Center for History and New Media. We wouldn’t be here today without your extraordinary friendship and generosity, and so let me give the first of many thanks to everyone for making this day happen.

We have several speakers this afternoon that I have the privilege of introducing, who will each say a few words about Roy and the center. But I wanted to start off the proceedings by giving voice to something miraculous that is happening as we celebrate today.

Right now, silently in the background, literally thousands of people worldwide are connected to the center’s servers, studying and conversing and learning. I checked our server logs just before this celebration, and just today, tens of thousands of people have visited CHNM sites. And just in the last thirty days well over a million visitors took advantage of the center’s open access resources and open source tools. For this reason, I can’t imagine that any academic in history has affected more people than Roy has.

And from this day forward, all of these millions of visitors to the many sites of the Roy Rosenzweig Center for History and New Media will know who Roy is. They will see the new logo that is behind this curtain on our many websites, and know whom to thank for the incredible riches and generosity that Roy envisioned when he came up with the then-radical mission statement for the center: “to use digital media and computer technology to democratize history—to incorporate multiple voices, reach diverse audiences, and encourage popular participation in presenting and preserving the past.” On a personal note, it’s truly a joy to be able to carry on the work of a good friend every day.

The endowment you have helped to raise will support the work of the Roy Rosenzweig Center for History and New Media for years to come. Thank you.

A Million Syllabi

Today I’m releasing a database of over a million syllabi gathered by my Syllabus Finder tool from 2002 to 2009. My hope is that this unique corpus will be helpful for a broad range of researchers. I’m fairly sure this is the largest collection of syllabi ever gathered, probably by several orders of magnitude.

I created the Syllabus Finder in 2002 when Google released their first API to access their search engine. The initial API included the ability to grab cached HTML from millions of web pages, which I realized could then be scanned using high-relevancy keywords to identify pages that were most likely syllabi. In addition to my lousy PHP code that got it up and running, the brilliant Simon Kornblith wrote some additional code to make it work well. The result was a tool that was quite popular (1.3 million queries) until Google deprecated their original API in 2009 in favor of (what I consider to be) a less useful API. (With the original API you could basically clone google.com, which I’m sure was not popular at the Googleplex.)

If you are interested in the kind of research that can be done on these syllabi, please read my Journal of American History article “By the Book: Assessing the Place of Textbooks in U.S. Survey Courses.” For that article I used regular expressions to pull book titles out of a thousand American history surveys to see how textbooks and other works are used by instructors. Some hidden elements emerged. I’m excited to see what creative ideas other scholars and researchers come up with for this large database.

Some important clarifications and caveats:

1) I’m providing this archive in the same spirit (and under same regulations) that the Internet Archive provides web corpora (indeed, this corpus could probably be recreated from the Internet Archive’s Wayback Machine, albeit after a lot of work). To the best of my knowledge, and because of the way they were obtained, all of the documents this database contains were posted on the open web, and were cached (or not) respecting open-web standards such as robots.txt. It does not contain any syllabi that were posted in private places, such as gated Blackboard installations. Indeed, I suspect that most of these syllabi come from universities where it is expected that professors post syllabi in an open fashion (as is the case here at Mason), or from professors like me who believe that openness is good for scholarship and teaching. But as with the Internet Archive, if you are the creator of a syllabus and really can’t sleep unless it is purged from this research database, contact me.

2) This database is provided as is and without support. I get enough email and unfortunately cannot answer questions. If you are appreciative, you can make a tax-free donation to the Center for History and New Media, for which you will receive a hug from me. The database is intended for non-commercial use of the type seen in my JAH article.

3) The database is an SQL dump consisting of 1.4 million rows. The columns are syllabiID (the Syllabus Finder’s unique identifier), url (web address of the syllabus at the time it was found), title (of the web page the syllabus was on), date_added (when it was added to the Syllabus Finder database), and chnm_cache (the HTML of the page on the date it was added). The database is 804 MB uncompressed. The corpus is heavily U.S.-centric because web pages were matched to English-language words, and for a time the Syllabus Finder only took pages from .edu domains (thus leaving out, e.g., .ac.uk URLs).

4) Because the Syllabus Finder was completely automated, some percentage of the 1.4 million documents are not syllabi (my best guess is about 20%). Most often these incorrect matches are associated course documents such as assignments, which are interesting in their own right. But some are oddball documents that just looked like syllabi to the algorithms. I have made no attempt to weed them out.

If you understand all of this clearly, then here’s a million syllabi for you: CHNM Syllabus Finder Corpus, Version 1.0 (30 March 2011) (265 MB download, zipped SQL file)

UPDATE 1 (11pm 3/30/11): Matt Burton has helpfully provided a torrent for this file. If you can, please use it instead of the direct download.

UPDATE 2 (9pm 3/31/11): Unfortunately I should have checked the exported database before posting. Version 1.0 does indeed have the URLs, titles, and dates of about 1.45 million syllabi but it is missing a majority of the HTML caches of those syllabi. I am working to recreate the full database, which will be much larger and more useful.

A Lesson from the Past about Genres and Bias

In my sophomore year of college I took a new course with more buzz than a summer blockbuster: “Postmodernism.” Students literally ran to sign up for it, partly because it was taught by the coolest, mustard-suited professor on campus, Andrew Ross, and partly because it promised a semester filled with graphic novels, Survival Research Labs, and Blade Runner.

Beyond the discussions of mechanical reproduction and simulacra, I remember several things vividly. One was Ross’s lecture on cyborgs in which he described Arnold Schwarzenegger in Terminator as “a condom filled with walnuts.” The second was my preceptor, a brand-new assistant professor named Jeff Nunokawa. Nunokawa was whip-smart and a great teacher, and he introduced my nineteen-year-old self to the incredible revelation that Batman had a homoerotic subtext. (I’ll pause here for you to snicker at my youthful ignorance.) Finally, and most importantly, both Ross and Nunokawa repeatedly emphasized in the course that any genre in any medium could have value—and on occasion sustained creativity and insight.

So I was glad to see a cover story on the boundless energy and intelligence of Nunokawa in the Princeton Alumni Weekly (which is actually produced monthly, in postmodern fashion), especially since the article highlighted Nunokawa’s writing of thousands of online posts about literature and philosophy, art and ideas. I cheered what I thought was a great example of a professor blogging, until I hit this paragraph:

For the record, he does not call this a blog, partly, he says, because “I hate that particular syllable,” but also, more importantly, because “it doesn’t catch what I’m really trying to do, whether successfully or not. These are essays. When I think of a blog — and maybe I’m being unfair to bloggers because I don’t spend much time in the blogosphere — my sense of blogs is that that they’re written very quickly. This is stuff that I compose and recompose, and then recompose and recompose and recompose. It’s very written.”

This is precisely the bias I’m arguing against in The Ivory Tower and the Open Web. There is no reason a blog has to be quickly or poorly written; the comment made me want to time-travel the Nunokawa of 1988, Terminator-like, to confront the Nunokawa of 2011. And if Nunokawa can have this prejudice against blogs, instead of viewing them as potential outlets for good writing owned by scholars themselves, imagine what Nunokawa’s more traditional colleagues think of the genres of the open web.

As in the Oscar Wilde plays Nunokawa often dissects, there’s a final, amusing irony to this story. Where does Nunokawa do his sophisticated blog…er, essaying? Facebook.

THATCamp 2011: Even Bigger, More Open, More Educational, More Fun

We decided to pull out all the stops for this year’s THATCamp (now called THATCamp Prime or THATCamp CHNM or that THATCamp since there are now so many regional THATCamps). From the THATCamp blog:

All year has been THATCamp time, seems like, but we’re now talking about that THATCamp, which will take place

June 3-5, 2011
Center for History and New Media, Fairfax, VA

We’ve instituted some changes this year:

  • THATCamp will be larger: we’re planning on having about 125 people who do all kinds of work related to the humanities and technology;
  • THATCamp will be truly open to all: instead of having an application process, we’ll be accepting all registrations up to 125 people until April 22;
  • THATCamp will have a BootCamp: the unconference will happen as usual on the weekend over a day and a half, but the Friday beforehand will be devoted to a series of workshops dedicated to improving technical skills; and
  • THATCamp is planning on at least two virtual sessions in which we get to talk to campers at THATCamp Liberal Arts Colleges and to Jon Voss about the outcome of his Linked Open Data in Libraries, Archives, and Museums Summit.

Needless to say, we’re psyched. See you there.

If you haven’t been to THATCamp yet, I can’t recommend it enough. It’s intense, fun, and you’ll learn more and meet more interesting, great people than anywhere else. There’s also a bit of Woodstock to it, and no big registration fee, just a very small suggested donation. We also have on-campus accommodations this year at the very nice new Mason Inn.

Register right now to reserve your slot. Hope to see you in June!

Defining Digital Humanities, Briefly

I’m participating in the Day of Digital Humanities this year, and the organizers have asked all participants to briefly define “digital humanities.” It’s a helpful exercise, and for those new to the field, it might be useful to give the many responses a quick scan. I wrote this one-sentence answer out fairly hastily, but think it’s not so bad:

Broadly construed, digital humanities is the use of digital media and technology to advance the full range of thought and practice in the humanities, from the creation of scholarly resources, to research on those resources, to the communication of results to colleagues and students.

The best answer to “How do you define digital humanities?” came from Lou Burnard: “With extreme reluctance.”

What Scholars Want from the Digital Public Library of America

[A rough transcript of my talk at the Digital Public Library of America meeting at Harvard on March 1, 2011. To permit unguarded, open discussion, we operated under the Chatham House Rule, which prevents attribution of comments, but I believe I’m allowed to violate my own anonymity.]

I was once at a meeting similar to this one, where technologists and scholars were discussing what a large digital library should look like. During a breakout session, the technologists huddled and talked about databases, indices, search mechanisms; the scholars, on the other side of the room, painted a vision of what the archive would look like online, in their view a graphical representation as close to the library as possible, where one could pull down boxes from the shelves, and then open those boxes and leaf through the folios one by one.

While the technologists debated digital infrastructure, the scholars were trying to replicate or maintain what they liked about the analog world they knew: a trusted order, the assurance of the physical, all of the cues they pick up from the shelf and the book. If we want to think about the Digital Public Library of America from the scholar’s point of view, we must think about how to replicate those signals while taking advantage of the technology. In short: the best of the single search box with the trust and feel of the bookshelf.

So how can this group translate those scholarly concerns into elements of the DPLA? I did what any rigorous, traditionally trained scholar would do: I asked my Twitter followers. Here are their thoughts, with my thanks for their help:

First, scholars want reliable metadata about scholarly objects like books. Close enough doesn’t count. Although Google has relatively few metadata errors (given that they handle literally a trillion pieces of metadata), these errors drive scholars mad, and make them skeptical of online collections.

Second, serendipity. Many works of scholarship come from the chance encounter of the scholar with primary sources. How can that be enhanced? Some in my feed suggested a user interface with links to “more like this,” “recent additions in your field,” or “sample collections.” Others advocated social cues, such as user-contributed notes on works in the library.

Third, there are different modes of scholarly research, and the interface has to reflect that: a simple discovery layer with a sophisticated advanced search underneath, faceted search, social search methods for collaborative practice, the ability to search within a collection or subcollection.

Fourth, connection with the physical. We need better representations of books online than the sameness of Google books, where everything looks like a PDF of the same size. Scholars also need the ability to go from the digital to the analog by finding a local copy of a work.

Finally, as I have often said, scholars have uses for libraries that libraries can’t anticipate. So we need the DPLA to enable other parties to build upon, reframe, and reuse the collection. In technical terms, this means open APIs.

Video: The Ivory Tower and the Open Web

Here’s the video of my plenary talk “The Ivory Tower and the Open Web,” given at the Coalition for Networked Information meeting in Washington in December, 2010. A general description of the talk:

The web is now over twenty years old, and there is no doubt that the academy has taken advantage of its tremendous potential for disseminating resources and scholarship. But a full accounting of the academic approach to the web shows that compared to the innovative vernacular forms that have flourished over the past two decades, we have been relatively meek in our use of the medium, often preferring to impose traditional ivory tower genres on the web rather than import the open web’s most successful models. For instance, we would rather digitize the journal we know than explore how blogs and social media might supplement or change our scholarly research and communication. What might happen if we reversed that flow and more wholeheartedly embraced the genres of the open web?

I hope the audience for this blog finds it worthy viewing. I enjoyed talking about burrito websites, Layer Tennis, aggregation and curation services, blog networks, Aaron Sorkin’s touchiness, scholarly uses of Twitter, and many other high- and low-brow topics all in one hour. (For some details in the images I put up on the screen, you might want to follow along with this PDF of the slides.) I’ll be expanding on the ideas in this talk in an upcoming book with the same title.

[youtube=http://www.youtube.com/watch?v=yeNjiuw-6gQ&w=480&h=385]