Author: Dan Cohen

Zotero News, Big and Small

So much for a modest, stealthy launch of Zotero. I promised a couple of weeks ago that I would return to my blog soon with a few updates about user feedback, some hints about new features, and perhaps some additional news items. With a modest private beta test and a few pages explaining the software on our new site, I assumed that Zotero would quietly and slowly enter into public consciousness. Little did I know that within two weeks I would get over 400 emails asking to join the beta test, help develop and extend Zotero, make it work better with resources on the web, and evangelize it on campuses and in offices around the globe. (Sorry to those I haven’t responded to yet; I’m still working on my email backlog.) Better yet, we received some fantastic news about support for the project, which is where I’ll begin this update.

The big news is that the Center for History and New Media has received an incredibly generous grant from the Andrew W. Mellon Foundation to help build major new features into the 2.0 release of Zotero (coming in 2007). Included in this substantial upgrade are great capabilities that beta testers are already clamoring for (as I’ll describe below). I’m deeply appreciative to the Mellon Foundation and especially Ira Fuchs and Chris Mackie for their support of the project, and we’re delighted to join the stable of other Mellon-funded, open-source projects that are trying to revolutionize higher education and the scholarly enterprise through the use of innovative information technology. We have a very ambitious set of goals we would like to accomplish in the next two years under Mellon funding, and we’re really excited to get started and push these advances out to an eager audience.

My thanks also to the beta testers who have reported bugs and sent in suggestions. (For a few early reviews and thoughts about Zotero, see posts on the blogs of Bill Turkel, Bruce D’Arcus (1, 2), Adrian Cooke, Jeanne Kramer-Smyth, and Mark Phillipson.) We’re planning on rolling all of the bug fixes and a few of the suggestions that we’ve already implemented into the public beta that will be released shortly. The most requested new features were auto-completion/suggestions for tags, better support for non-Western and institutional authors, full-text searches of articles that are saved into one’s Zotero collection, more import/export options, support for other online collections and resources, and the detection of duplicate records. The developers are working feverishly on all of these fronts, and I think the Beta 2 release (our public beta) will be considerably better because of all of this helpful feedback.

I have intentionally left out perhaps the most wanted feature: tools for collaboration. Some of those who have started to hack the software have noticed what we at the Center for History and New Media have been thinking about from the start—that it seems very easy to add ways to send and receive information to and from Zotero (it does reside in the web browser, after all). What if you could share a folder of references and notes with a colleague across the country? What if you could receive a feed of new resources in your area of interest? What if you could synchronize your Zotero library with a server and access it from anywhere? What if you could send your personal collection to other web services, e.g., a mapping service or text analyzer or translation engine?

I’m glad so many of us are thinking alike. Those are the issues we’ve just started to work on, thanks to the Mellon Foundation. Stay tuned for the Zotero server and additional exciting extensions to the Zotero platform.

And despite my email backlog, please do contact me if you would like to join the Zotero movement.

Introducing Zotero

Regular readers of this blog know that over the last year I have been trumpeting our forthcoming software tool for research that will enable vastly simplified citation management, note taking, and advanced scholarly research right within the Firefox browser. Over the past year, I have called this tool SmartFox, Firefox Scholar, and Scholar for Firefox. The domain for the original name was already taken, and the latter two names were too confusing (“Is that the same as Google Scholar?”). Last Friday, a final name was given to the project, a website launched, and a lucky group of people received the first beta. The word that will be on everyone’s lips this fall: Zotero (zoh-TAIR-oh).

I’ll write much more in this space about Zotero over the coming year (and beyond), since I conceive it not just as a free EndNote replacement (actually, it’s already much better than EndNote in only its 1.0 release), but as a platform for new kinds of digital research. The best place to begin to see what Zotero can do is by heading over to the site’s home page and the quick start guide.

But I wanted to devote this first post on Zotero to those who did the incredible job of developing the software: Dan Stillman, Simon Kornblith, and David Norton. While several of us at the Center for History and New Media thought deeply about what such a tool should look like, Dan, Simon, and David brilliantly executed our plan—and added countless touches and ideas of their own. When you see how amazing the results are, you’ll really appreciate their work.

Even though we’ve been relatively low-key about promoting Zotero as we fix some last-minute bugs, I’ve gotten dozens of messages over the last few days about the project. My blanket answer: we’ll have a public beta by the end of September 2006—thanks, of course, to Dan, Simon, and David.

Stay tuned to this blog and I’ll explain some of the more innovative features of Zotero. I’ll also show how researchers can best use the tool, describe how other software developers can extend it and link it to other web tools and services, and drop hints about our ambitious long-range plans.

Raw Archives and Hurricane Katrina

Several weeks ago during my talk on the “Possibilities and Problems of Digital History and Digital Collections” at the joint meeting of the Council of State Archivists, the National Association of Government Archives and Records Administrators, and the Society of American Archivists (CoSA, NAGARA, and SAA), I received a pointed criticism from an audience member during the question-and-answer period. Having just shown the September 11 Digital Archive, the questioner wanted to know how this qualified as an “archive,” since archives are generally based upon rigorous principles of value, selection, and provenance. It’s a valid critique—though a distinction that might be lost on a layperson who is unaware of archival science and might consider their shoebox of photos an “archive.” Maybe it’s time for a new term: the raw archive. On the Internet, these raw archives are all around us.

Just think about Flickr, Blogger, or even (dare I speak its name) YouTube. These sites are documenting—perhaps in an exhibitionist way, but documenting nonetheless—the lives of millions of people. They are also aggregating that documentation in an astonishing way that was not possible before the web. They are not archives in the traditional sense, instead eschewing selection biases for a come one, come all attitude that has produced collections of photos, articles, and videos several orders of magnitude larger than anything in the physical world. They may be easy to disparage, but I suspect they will be extraordinarily useful for future historians and researchers.

Or I should say would be, if they were being run by entities that are concerned with the very long run. But the Flickrs of the web are companies, and have little commitment to store their contents for ten, much less a hundred, years.

That’s why more institutions with a long-term view, such as universities, libraries, and museums, need to think about getting into the raw archive business. We in the noncommercial world should be incredibly thankful for the Internet Archive, which has probably done the most in this respect. Institutions that are oriented toward the long run have to think about adding the raw to their already substantial holdings of the “cooked” (that is, traditional archives).

Our latest contribution to this effort is the Hurricane Digital Memory Bank, which has just undergone a redesign and which now has over 5000 contributions. It’s a great example of what can be done with the raw, when thought about with the researcher, rather than voyeur, in mind. On this anniversary of Hurricane Katrina, I invite you to add your recollections, photos, and other raw materials to the growing archive. And please tell others. We have a come one, come all attitude toward contributions, and need as many people as possible to help us build the (raw) archive.

Professors, Start Your Blogs

With a new school year about to begin, I want to reach out to other professors (and professors-to-be, i.e., graduate students) to try to convince more of them to start their own blogs. It’s the perfect time to start a blog, and many of the reasons academics state for not having a blog are, I believe, either red herrings or just plain false. So first, let me counter some biases and concerns I hear from a lot of my peers (and others in the ivory tower) when the word “blog” is mentioned.

Despite the fact that tens of millions of people now have blogs, the genre is still considered by many—especially those in academia—to be the realm of self-involved, insecure, oversexed teens and twentysomethings. To be sure, there are plenty of blogs that trace the histrionics of adolescence and its long, tortured aftermath. And there’s no denying that other blogs cover such fascinating, navel-gazing topics as one man’s love of his breakfast (preferably eggs Benedict, if you must know). And—before I throw too many stones in this glass house—I too have indulged in the occasional narcissistic act in this column (not to mention the “shameless plug” for my book, Digital History, in the right column of my home page).

But this common criticism of the genre of the blog has begun to ring hollow. As Bryan Alexander of the National Institute for Technology and Liberal Education recently noted at a meeting I attended on emerging web technologies and higher education, a remarkably wide range of blog styles and genres now exist—including many noteworthy examples by professors. There are blogs by historians providing commentary on current events, blogs by journalism professors dissecting mass media coverage of health news, and blogs by whole academic departments, like Saint Cloud State University’s astronomy department.

Blogs are just like other forms of writing, such as books, in that there’s a whole lot of trash out there—and some gems worth reading. It just depends on what you choose to read (or write). And of course many (most? all?) other genres of writing have elements of self-promotion and narcissism. After all, a basic requirement of writing is the (often mistaken) belief that you have something to say that’s important.

Second, no rule book mandates that one adopt the writing style of a hormone-crazed college student. Professors, especially those in the humanities, have spent a great deal of their lives learning how to write prose, and to write in a variety of styles for different purposes: monographs, popular works, reviews, lectures to students, presentations to colleagues. For this blog I’ve adopted a plainspoken prose style with (I hope) a little humor here and there to lighten the occasional technically complex post. I’ve also carefully avoided the use of extreme adjectives and hyperbole that are common on the blogs the academic critics love to hate. I’m proud to say I’ve used only a handful of exclamation points so far. This “casual rationalist” voice is but one option among many, but it’s a style I’ve crafted to disarm those who believe that blogs can be nothing but trouble for the careers of graduate students and professors.

Another factor that has distanced professors from blogs was anonymity. Most early blogs, and especially the ones the media liked to cover, were anonymous or pseudonymous. But I would say that the vast majority of new blogs are clearly attributed (even if they have odd monikers, unlike the boring dancohen.org). Attribution and its associated goods, such as responsibility and credit, should make academics feel better about the genre.

Moreover, as I pointed out when I began this blog last year, a blog is really just a series of “posts” (whatever those are; I began the post you’re reading by calling it an “article,” because at almost 2,000 words it feels less like a post-it note than a legal pad). There’s no blogging requirement to discuss botox or baked beans or boyfriends, or to write short, snarky bits rather than long, balanced, thoughtful essays. A failure to understand this simple point has kept too many serious folks like professors on the sidelines as the blogosphere has exponentially expanded.

The addition of professorial blogs to the web will enrich the medium greatly. The critics of blogging are perhaps onto something when they note that the blogosphere has too many people writing on too few topics (does the world really need another blog on the latest moves of Apple Computer?). Although they frequently teach broad, introductory courses, professors are hired and promoted because they are specialists who discover and explain things that few others understand. For these theorists and researchers, blogging can be a powerful way to provide “notes from the field” and glosses on topics that perhaps a handful of others worldwide know a lot about. While I tend to avoid the hot term of the moment, professors are the true masters of the “long tail” of knowledge.

When I was in graduate school, the Russian historian Paul Bushkovitch once told me that the key to being a successful scholar was to become completely obsessed with a historical topic, to feel the urge to read and learn everything about an event, an era, or a person. In short, to become so knowledgeable and energetic about your subject matter that you become what others immediately recognize as a trusted, valuable expert.

As it turns out, blogs are perfect outlets for obsession. Now, there’s good and bad obsession. What the critics of blogs are worried about is the bad kind—the obsession that drives people to write about their breakfast in excruciating detail.

Yet, as Bushkovitch’s comment entailed, obsession—properly channeled and focused on a worthy subject—has its power. It forges experts. It stimulates a lifelong interest in learning (think, for a moment, about the countless examples of “retired” professors still writing influential books). The most stimulating, influential professors, even those with more traditional outlets for their work (like books and journals) overflow with views and thoughts. Shaped correctly, a blog can be a perfect place for that extra production of words and ideas. The Chronicle of Higher Education may love to find examples of Ph.D.s losing a tenure-track job because of their tell-all (anonymous) blogs, but I suspect that in the not too distant future the right type of blog—the blog that shows how a candidate has full awareness of what’s going on in a field and has potential as a thought leader in it—will become an asset not to be left off one’s CV.

The best bloggers inevitably become a nexus for information exchange in their field. Take, for instance, Lorcan Dempsey’s blog on matters relating to libraries and digital technology. It has become a touchstone for many in his field—my estimate is that he has a thousand subscribers who get updates from his blog daily. Overall, I suspect his blog has more actual readers than some print publications in his field. Looking for influence? A large blog audience is as good as a book or seminal article. A good blog provides a platform to frame discussions on a topic and point to resources of value.

Altruistic reasons for writing a blog also beckon. Writing a blog lets you reach out to an enormous audience beyond academia. Some professors may not want that audience, but I believe it’s part of our duty as teachers, experts, and public servants. It’s great that the medium of the web has come along to enable that communication at low cost.

Concerned about someone stealing your ideas if you post them to a blog? Don’t. Unless you decide otherwise, you have the same copyright on words you write on a blog as those published on paper. And you have the precedence that comes with making those words public far earlier than they would appear in a journal or book.

Worried about the time commitment involved in writing a blog? The constant pressure to post something daily or weekly? This was my stumbling block a year ago when I was thinking of starting a blog. I’m busy; we’re all busy. What I’ve found, however, is that writing a blog does not have to take a lot of time. Promoters of blogs often tell prospective bloggers it’s critical to post frequently and reliably. Nonsense. Such advice misunderstands what’s so great about RSS (Really Simply Syndication), the underlying technology of blogs that notifies people when you have a new post. RSS “pushes” new material to readers no matter the interval between posts. RSS is thus perfect for busy people with blogs who are naturally inconsistent or infrequent in their posting schedule. If you post every day, then readers can just visit your site daily; if you post six times a year, randomly (when you really have something to say), RSS is the technology for you. Without it, no one would ever remember to visit your website.

RSS also allows aggregation of blog “feeds” so that by mixing together a number of RSS files an audience can track the goings-on in a field in a single view. I would love to see a hundred historians of Victorian science have blogs to which they post quarterly. That would mean an average of one thoughtful post a day on a subject in which I’m greatly interested.

For those who need further prodding to get past these worries and biases, blogging as we know it (or don’t know it, if you are unfamiliar with the use of RSS “news readers”) is about to change. Seamless support for RSS is now being written into the most commonly used software: email programs and web browsers. Rather than having to figure out how to manage subscriptions to blogs in a news reader or on an off-putting “Web 2.0” site, the average user will find soon find new posts along with their email, or beckoning them from within their browser. And new versions of Blogger and other blog software has made it easier than ever to start a blog. In other words, blogs are about to become much more accessible and integrated into our digital lives.

Now, I’m aware the irony of imploring, on a blog, professors who don’t have a blog to start a blog. I fear I’m preaching to the choir here. Surely the subscribers to this blog’s feed are blog-savvy already, and many undoubtedly have their own blogs. So I need your help: please tell other professors or professors-to-be about this post, or forward the URL for the post to appropriate email lists or forums (if you’re worried that the long URL is difficult to cite, here’s a tiny URL that will redirect to this page: http://tinyurl.com/ptsje).

But wait—haven’t I just asked you to be an accomplice in a shameless, narcissistic act typical of blogs? Perhaps.

Mapping What Americans Did on September 11

I gave a talk a couple of days ago at the annual meeting of the Society for American Archivists (to a great audience—many thanks to those who were there and asked such terrific questions) in which I showed how researchers in the future will be able to intelligently search, data mine, and map digital collections. As an example, I presented some preliminary work I’ve done on our September 11 Digital Archive combining text analysis with geocoding to produce overlays on Google Earth that show what people were thinking or doing on 9/11 in different parts of the United States. I promised a follow-up article in this space for those who wanted to learn how I was able to do this. The method provides an overarching view of patterns in a large collection (in the case of the September 11 Digital Archive, tens of thousands of stories), which can then be prospected further to answer research questions. Let’s start with the end product: two maps (a wide view and a detail) of those who were watching CNN on 9/11 (based on a text analysis of our stories database, and colored blue) and those who prayed on 9/11 (colored red).

September 11 Digital Archive stories about CNN and prayer mapped onto Google Earth
Google Earth map of the United States showing stories with CNN viewing (blue) and stories with prayer (red) [view full-size version for better detail]

September 11 Digital Archive stories about CNN and prayer mapped onto Google Earth - detail
Detail of the Eastern United States [view full-size version for better detail]

By panning and zooming, you can see some interesting patterns. Some of these patterns may be obvious to us, but a future researcher with little knowledge of our present could find out easily (without reading thousands of stories) that prayer was more common in rural areas of the U.S. in our time, and that there was especially a dichotomy between the very religious suburbs (or really, the exurbs) of cities like Dallas and the mostly urban CNN-watchers. (I’ll present more surprising data in this space as we approach the fifth anniversary of 9/11.)

OK, here’s how to replicate this. First, a caveat. Since I have direct access to the September 11 Digital Archive database, as well as the ability to run server-to-server data exchanges with Google and Yahoo (through their API programs), I was able to put together a method that may not be possible for some of you without some programming skills and direct access to similar databases. For those in this blog’s audience who do have that capacity, here’s the quick, geeky version: using regular expressions, form an SQL query into the database you are researching to find matching documents; select geographical information (either from the metadata, or, if you are dealing with raw documents, pull identifying data from the main text by matching, say, 5-digit numbers for zip codes); put these matches into an array, and then iterate through the array to send each location to either Yahoo’s or Google’s geocoding service via their maps API; take the latitude and longitude from the result set from Yahoo or Google and add these to your array; iterate again through the array to create a KML (Keynote Markup Language) file by wrapping each field with the appropriate KML tag.

For everyone else, here’s the simplest method I could find for reproducing the maps I created. We’re going to use a web-based front end for Yahoo’s geocoding API, Phillip Holmstrand’s very good free service, and then modify the results a bit to make them a little more appropriate for scholarly research.

First of all, you need to put together a spreadsheet in Excel (or Access or any other spreadsheet program; you can also just create a basic text document with columns and tabs between fields so it looks like a spreadsheet). Hopefully you will not be doing this manually; if you can get a tab-delimited text export from the collection you wish to research, that would be ideal. One or more columns should identify the location of the matching document. Make separate columns for street address, city, state/province, and zip codes (if you only have one or a few of these, that’s totally fine). If you have a distinct URL for each document (e.g., a letter or photograph), put that in another column; same for other information such as a caption or description and the title of the document (again, if any). You don’t need these non-location columns; the only reason to include them is if you wish to click on a dot on Google Earth and bring up the corresponding document in your web browser (for closer reading or viewing).

Be sure to title each column, i.e., use text in the topmost cell with specific titles for the columns, with no spaces. I recommend “street_address,” “city,” “state,” zip_code,” “title,” “description,” and “url” (again, you may only have one or more of these; for the CNN example I used only the zip codes). Once you’re done with the spreadsheet, save it as a tab-delimited text file by using that option in Excel (or Access or whatever) under the menu item “Save as…”

Now open that new file in a text editor like Notepad on the PC or Textedit on the Mac (or BBEdit or anything else other than a word processor, since Word, e.g., will reformat the text). Make sure that it still looks roughly like a spreadsheet, with the title of the columns at the top and each column separated by some space. Use “Select all” from the “Edit” menu and then “Copy.”

Now open your web browser and go to Phillip Holmstrand’s geocoding website and go through the steps. “Step #1” should have “tab delimited” selected. Paste your columned text into the big box in “Step #2” (you will need to highlight the example text that’s already there and delete it before pasting so that you don’t mingle your data with the example). Click “Validate Source” in “Step #3.” If you’ve done everything right thus far, you will get a green message saying “validated.”

In “Step #4” you will need to match up the titles of your columns with the fields that Yahoo accepts, such as address, zip code, and URL. Phillip’s site is very smart and so will try to do this automatically for you, but you may need to be sure that it has done the matching correctly (if you use the column titles I suggest, it should work perfectly). Remember, you don’t need to select each one of these parameters if you don’t have a column for every one. Just leave them blank.

Click “Run Geocoder” in “Step #5” and watch as the latitudes and longitudes appear in the box in “Step #6.” Wait until the process is totally done. Phillip’s site will then map the first 100 points on a built-in Yahoo map, but we are going to take our data with us and modify it a bit. Select “Download to Google Earth (KML) File” at the bottom of “Step #6.” Remember where you save the file. The default name for that file will be “BatchGeocode.kml”. Feel free to change the name, but be sure to keep “.kml” at the end.

While Phillip’s site takes care of a lot of steps for you, if you try right away to open the KML file in Google Earth you will notice that all of the points are blazing white. This is fine for some uses (show me where the closest Starbucks is right now!), but scholarly research requires the ability to compare different KML files (e.g., between CNN viewers and those who prayed). So we need to implement different colors for distinct datasets.

Open your KML file in a text editor like Notepad or Textedit. Don’t worry if you don’t know XML or HTML (if you do know these languages, you will feel a bit more comfortable). Right near the top of the document, there will be a section that looks like this:

<Style id=”A”><IconStyle><scale>0.8</scale><Icon><href>root://icons/
palette-4.png</href><x>30</x><w>32</w><h>32</h></Icon></IconStyle>
<LabelStyle><scale>0</scale></LabelStyle></Style>

To color the dots that this file produces on Google Earth, we need to add a set of “color tags” between <IconStyle> and <scale>. Using your text editor, insert “<color></color>” at that point. Now you should have a section that looks like this:

<Style id=”A”><IconStyle><color></color><scale>0.8</scale><Icon><href>root:
//icons/palette-4.png</href><x>30</x><w>32</w><h>32</h></Icon></IconStyle>
<LabelStyle><scale>0</scale></LabelStyle></Style>

We’re almost done, but unfortunately things get a little more technical. Google uses what’s called an ABRG value for defining colors in Google Earth files. ABRG stands for “alpha, blue, green, red.” In other words, you will have to tell the program how much blue, green, and red you want in the color, plus the alpha value, which determines how opaque or transparent the dot is. Alas, each of these four parts must be expressed in a two-digit hexidecimal format ranging from “00” (no amount) to “ff” (full amount). Combining each of these two-digit values gives you the necessary full string of eight characters. (I know, I know—why not just <color>red</color>? Don’t ask.) Anyhow, a fully opaque red dot would be <color>ff00ff00</color>, since that value has full (“ff”) opacity and full (“ff”) red value (opacity being the first and second places of the eight characters and red being the fifth and sixth places of the eight characters). Welcome to the joyous world of ABRG.

Let me save you some time. I like to use 50% opacity so I can see through dots. That helps give a sense of mass when dots are close to or on top of each other, as is often the case in cities. (You can also vary the size of the dots, but let’s wait for another day on that one.) So: semi-transparent red is “7f00ff00”; semi-transparent blue is “7fff0000”; semi-transparent green is “7f0000ff”; semi-transparent yellow is “7f00ffff”. (No, green and red don’t make yellow, but they do in this case. Don’t ask.) So for blue dots that you can see through, as in the CNN example, the final code should have “7fff0000″ inserted between <color> and </color>, resulting in:

<Style id=”A”><IconStyle><color>7fff0000</color><scale>0.8</scale><Icon><href>
root://icons/palette-4.png</href><x>30</x><w>32</w><h>32</h></Icon>
</IconStyle><LabelStyle><scale>0</scale></LabelStyle></Style>

When you’ve inserted your color choice, save the KML document in your text editor and run the Google Earth application. From within that application, choose “Open…” from the “File” menu and select the KML file you just edited. Google Earth will load the data and you will see colored dots on your map. To compare two datasets, as I did with prayer and CNN viewership, simply open more than one KML file. You can toggle each set of dots on and off by clicking the checkboxes next to their filenames in the middle section of the panel on the left. Zoom and pan, add other datasets (such as population statistics), add a third or fourth KML file. Forget about all the tech stuff and begin your research.

[For those who just want to try out using a KML file for research in Google Earth, here are a few from the September 11 Digital Archive. Right-click (or control-click on a Mac) to save the files to your computer, then open them within Google Earth, which you can download from here. These are files mapping the locations of: those who watched CNN; those who watched Fox News (far fewer than CNN since Fox News was just getting off the ground, but already showing a much more rural audience compared to CNN); and those who prayed on 9/11.]

ACLS Fellowships, Chicago Colloquium, Scholar for Firefox Update

I’m back from vacation and have lots to catch up on, but wanted to pass along some quick notes about upcoming opportunities and deadlines that might be of interest to this blog’s audience.

I feel incredibly fortunate to have received an American Council of Learned Societies’ Digital Innovation Fellowship for the next year. 2006-7 will be the first year for this fellowship, which is supporting five projects, including efforts involving GPS, corpus digitization, map mashups, text and data mining, and software development. The call for applications for 2007-8 has already gone out, and the paperwork is due in just a couple of months, on September 27, 2006. Having written one of these applications, I recommend getting an early start. Beyond the normal “what I’m going to do” narrative, you need to come up with a budget and think about institutional support for your project (digital projects often require such things).

The Chicago Colloquium on Digital Humanities and Computer Science has sent out a call for papers in anticipation of a meeting on November 5-6, 2006. The meeting is going to expand upon the topics discussed in the March 2006 issue of D-Lib Magazine. I’m going to try to be there. The deadline for applications is August 15, 2006.

Finally, we still have a few spaces for beta testers for our upcoming release of Scholar for Firefox. For those who are hearing about this for the first time, Scholar is a citation manager and note-taking application (like EndNote) that integrates right into the Firefox web browser. Since it lives in the browser, it has some very helpful—and, we think, innovative—features, such as the ability to sense when you are viewing the record for a book (on your library’s website or at Amazon or elsewhere) and to offer to save the full citation information to your personal library of references (unlike del.icio.us or other bookmarking tools, it actually grabs the author, title, and copyright information, not just the URL). Scholar will have “smart folder” and “smart search” technology and other user interface capabilities that are reminiscent of iTunes and other modern software. And we hope to unveil some collaborative features soon as well (such as the ability to share and collaborate on bibliographies and notes, find new books and articles that might be of interest to you based on what you’ve already saved to your library, etc.). If you’re interested in testing the software, please email me. The limited release beta should be available around August 15, 2006.

The New Center for History and New Media

This month the Center for History and New Media moved into a wonderful new space on the campus of George Mason University. We couldn’t be more delighted; it’s a tremendous new building (provisionally named “Research I”; if you have several million dollars lying around and want a building named after you, please contact GMU). CHNM takes up about half of the top floor, where we are neighbors with the ominously named “Autonomous Robotics Laboratory.” Perhaps the most amusing part of the building is the sign in the lobby listing the other tenants. Needless to say, we’re the only historians in the building.

Research I

The other side of the building, with the observatory (our conference room is just below, in the tower)

CHNM’s main computer lab

The “West Wing” of CHNM, where my office is

The lobby sign

The Last Six Months

I’ll be away from my blog for the next two weeks, so until then, here’s a look back at what I consider to be my best posts from the last six months. As I explained when I started this blog, my goal has been to try to avoid adding yet more echo to the echo chamber of the blogosphere, and instead to try to write mostly longer pieces on the intersection of computing, scholarship, and the humanities. I haven’t always succeeded—I have occasionally succumbed, like so many others, to mindlessly blogging about the latest moves of the Googles and Microsofts—but for the most part, I’m pleased with most of what I’ve written, especially the following list. More importantly, I hope you’ve found this blog helpful.

My series on creating this blog from scratch (includes thoughts about the nature of blogs, RSS, search, and other topics):
Part 1: What is a Blog, Anyway?
Part 2: Advantages and Disadvantages of Popular Blog Software
Part 3: The Double Life of Blogs
Part 4: Searching for a Good Search
Part 5: What is XHTML, and Why Should I Care?

Practical discussions about using web technology in academia and elsewhere:
Using AJAX Wisely
Search Engine Optimization for Smarties
Measuring the Audience of a Digital Humanities Project

Thoughts about the nature and uses of digital works:
The Wikipedia Story That’s Being Missed
Wikipedia vs. Encyclopaedia Britannica for Digital Research
Wikipedia vs. Encyclopaedia Britannica Keyword Shootout Results
The Perfect and the Good Enough: Books and Wikis
When Machines Are the Audience
What Would You Do With a Million Books?
Rough Start for Digital Preservation

The impact of the web on learning, teaching, and testing:
The Single Box Humanities Search
No Computer Left Behind
Mapping Recent History

On copyright and related matters:
2006: Crossroads for Copyright
Impact of Field v. Google on the Google Library Project
Clifford Lynch and Jonathan Band on Google Book Search

Google Fingers

No, it’s not another amazing new piece of software from Google, which will type for you (though that would be nice). Just something that I’ve noticed while looking at many nineteenth-century books in Google’s massive digitization project. The following screenshot nicely reminds us that at the root of the word “digitization” is “digit,” which is from the Latin word “digitus,” meaning finger. It also reminds us that despite our perception of Google as a collection of computer geniuses, and despite their use of advanced scanning technology, their library project involves an almost unfathomable amount of physical labor. I’m glad that here and there, the people doing this difficult work (or at least their fingers) are being immortalized.

[The first page of a Victorian edition of Plato’s Euthyphron, a dialogue about the origin and nature of piety. Insert your own joke here about Google’s “Don’t be evil” motto.]

The Perfect and the Good Enough: Books and Wikis

As you may have noticed, I haven’t posted to my blog for an entire month. I have a good excuse: I just finished the final edits on my forthcoming book, Equations from God: Pure Mathematics and Victorian Faith, due out early next year. (I realized too late that I could have capitalized on Da Vinci Code fever and called the book The God Code, thus putting an intellectual and cultural history of Victorian mathematics in the hands of numerous unsuspecting Barnes & Noble shoppers.) The process of writing a book has occasionally been compared to pregnancy and childbirth; as the awe-struck husband of a wife who bore twins, I suspect this comparison is deeply flawed. But on a more superficial level, I guess one can say that it’s a long process that produces something of which one can be very proud, but which can involve some painful moments. These labor pains are especially pronounced (at least for me) in the final phase of book production, in which all of the final adjustments are made and tiny little errors (formatting, spelling, grammar) are corrected. From the “final” draft of a manuscript until its appearance in print, this process can take an entire year. Reading Roy Rosenzweig’s thought-provoking article on the production of the Wikipedia, just published in the Journal of American History, was apropos: it got me thinking about the value of this extra year of production work on printed materials and its relationship to what’s going on online now.

Is the time spent getting books as close to perfection as possible worth it? Of course it is. The value of books comes from an implicit contract between the reader and those who produce the book, the author and publisher. The producers ensure, through many cycles of revision, editing, and double checking, that the book contains as few errors as possible and is as cogent and forceful as possible. And the reader comes to a book with an understanding that the pages they are reading entail a tremendous amount of effort to reach near-perfection—thus making the book worthy of careful attention and consideration.

On the other hand, I’ve become increasingly fond of Voltaire’s dictum that “the perfect is the enemy of the good”; that is, in human affairs the (often nearly endless) search for perfection often means you fail to produce a good-enough solution. Roy Rosenzweig and I use the aphorism in Digital History, because there’s so much to learn and tinker with in trying to put history online that if you obsess about it all you will never even get started with a basic website. As it turns out, the history of computing includes many examples of this dynamic. For instance, Ethernet was not as “perfect” a technology as IBM’s Token-Ring, which, as its name implies, passed a “token” around so that every item on a network wouldn’t talk at once and get in each other’s way. But Ethernet was good enough, had decent (but not perfect) solutions to the problems that IBM’s top-notch engineers had elegantly solved, and was cheaper to implement. I suspect you know which technology triumphed.

Roy’s article, “Can History Be Open Source? Wikipedia and the Future of the Past,” suggests that we professional historians (and academics who produce books in general) may be underestimating good-enough online publishing like Wikipedia. Yes, Wikipedia has errors—though not as many as the ivory tower believes. Moreover, it is slowly figuring out how to deal with its imperfections, such as the ability of anyone to come along and edit a topic about which they know nothing, by using fairly sophisticated social and technological methods. Will it ever be as good as a professionally produced book? Probably not. But maybe that’s not the point. (And of course many books are far from perfect too.) Professors need to think carefully about the nature of what they produce given new forms of online production like wikis, rather than simply disparaging them as the province of cranks and amateurs. Finishing a book is as good a time to do that as any.