History Tagging Text Mining

American Studies Tagline

Dave Lester provides an interesting visualization of the history of American Studies over the last fifty years by running Lucy Maddox’s Locating American Studies: The Evolution of a Discipline through a tag cloud creator and then putting it on a slider timeline. Note the rise and fall of Leo Marx’s influence on the field, among other things.

History Research Software Tagging Zotero

Social and Semantic Computing for Historical Scholarship

Under the assumption that many readers of this blog don’t receive the American Historical Association’s magazine Perspectives, you might be interested in this article I wrote for the May 2007 issue. In the piece I discuss the Zotero project’s connection to several recent trends in computing, and think ahead to what the Zotero server might mean for academic fields like history.

Collaboration Scholarship Tagging

Blackboard’s Entry into Web 2.0 Unveiled:

Maybe they should have kept it veiled. I’m surprised at how poorly designed this site is (surely a freshman who knows Ruby on Rails and a little Photoshop could have put together a better social bookmarking site in a week), not to mention that additions to the site are limited to users of the Blackboard course management system. How do they plan to get the scale necessary for network effects? From students who are thrilled by the new functionality of the website they have to go to for their classes?

Blogs Tagging

Creating a Blog from Scratch, Part 7: Tags, What Are They Good For?

Evidently quite a few things. In the past few years, tags have been attached to virtually everything, from web links to photos to bars. The University of Pennsylvania has recently introduced a way for those on campus to tag items in their online catalog, Franklin. With the arrival of the Zotero server this year, it will be possible for the community of Zotero users to collaboratively tag almost any object of research, from books to sculptures to letters. For their promoters, tags are a low-cost, democratic advance over traditional systems of cataloging. Detractors disparage tags as lacking the rigor of those tried-and-true methods. As I started to think about the composition of this blog, all I wanted to know was, why do so many blogs have tags all over them and what function or functions do they serve? Do I need them? What are they good for?

I have to admit that when I started this blog I had a visceral dislike of tags, probably because I was approaching them from the perspective of an academic who liked the precision and professionalism of the card catalog and encyclopedia. Tags seemed fatally flawed as putative successors to Library of Congress subject headings or the indexes in the back of books. I still believe the much-ballyhooed “tag clouds,” or set of tags of various sizes arranged in a pattern to show the contents of a blog or book or site, are poor substitutes for a good index of a work—not only because indexes are usually done by professionals who know what to highlight and how to summarize those topics, but also because indexes tell little stories through their levels, modifiers, and page numbers. For instance, here’s a section of the index the talented Jim O’Brien did for my book Equations from God:

Euclid, 165; in mathematics education, 147, 148, 214n185; Elements by, 21, 106, 138, 179, 180, 214n185; long-lasting influence of, 21, 58, 79, 147, 164, 174; waning influence of, in late Victorian era, 138, 148, 164, 178-179, 180 (see also non-Euclidean geometry)

At a glance you can tell the story line about Euclid—the ancient Greek mathematician’s incredibly long relevance (well into the modern era), and his eventual fall from grace in the nineteenth century in the face of a new kind of geometry. Some have proposed adding the hierarchical levels and other index-like features to tags to approach this level of usefulness, but that misses the point of tagging: it works because it’s done in a simple, generally offhand way. Add a lot of thought and hurdles to the process, and you’ll kill tagging. Tagging is a classic case of the “good enough” besting the “perfect” in new media.

Despite my hesitancy, I figured that there must be some reason to use tags on this blog. So I included them in the database but chose, due to my initial aversion, not to show them all over my site like many blogs do. They would just sit in the background and in the RSS feed. It turned out that was a very good compromise as I began to appreciate that tags are good at some functions that traditional taxonomies don’t address.

Much of the antagonism between the promoters and detractors of tags seems to arise from the sense—I believe, the incorrect sense—that they are competitors for the same market. But when you actually look at tags in action and actuality, it’s clear that they serve a number of functions that are distinct from the traditional cataloging functions and that make them poor replacements for high-quality categorization.

For example, look at the variety of tags on a highly used folksonomic site like, the grandaddy of social bookmarking. To be sure, there are some fine categorizations of websites. But also harbors a large number of tags with other aims. Coexisting with tags that might be at home in a Library of Congress subject heading (e.g., “history”) are tags like “readlater” (busy people marking a site as worth going back to when they get the chance), “hist301” (a tag used by students in a particular class for a particular semester), “natn” (used by listeners of the podcast “Net at Nite” to submit websites to the hosts for consideration), and of course every possible variation of “cool” (to signify a site’s…coolness).

Awareness of these other kinds of tags made me realize that what distinguishes tags from traditional forms of categorization, aside from the obvious amateur/democratic vs. professional distinction, is that while both are forms of description, tags often have specific audiences and time frames in mind, while traditional categorizations (such as Library of Congress subject headings) have only a vague general audience in mind and try to be as timeless as possible.

This distinction is particularly true when you realize that tags are strongly interwoven with feeds (RSS). Since people can subscribe to the feed of a tag, tagging a blog post in effect places it into a live, running stream of alerts to an awaiting audience. Want to alert John Musser, who maintains the list of APIs I have frequently referred to in this space, about a new API? Just tag a blog post “API” or “APIs” and I suspect John will hear about it very soon, as will a very large audience of those interested in knitting together information on the web.

Thus tags have a great utility on the “live” web, as the blog search engine Technorati calls it, as well as for personal uses of an individual or microaudiences like a college class or even for inane commentary (“awesome”). Yet I still feel that as an entrée into a blog, as the equivalent of scanning a table of contents or the index of a book, they are fairly poor. I had planned to expose my internal tags of posts to the audience of this blog in some “traditional” blog way—at the bottom of each post, down the left sidebar, in a tag cloud—but it didn’t seem helpful. If someone wants to find all of my posts on copyright, they can search for them in the upper right search box. And the tag clouds I’ve tried all seem to misrepresent the overall thrust of this blog since (like everyone else using tags) I haven’t put a lot of thought into the tags.

My hunch early on was that tags are best heard from but not seen, and I think I was mostly right about that.

Next up in the series: I make my first change to the blog, from a partial feed to a full feed, and explain the advantages and disadvantages of both—and why I’ve decided to switch.

Part 8: Full Feeds vs. Partial Feeds

Blogs Google History Maps Mashups Tagging

Hurricane Digital Memory Bank Featured on CNN

I was interviewed yesterday by CNN about a new project at the Center for History and New Media, the Hurricane Digital Memory Bank, which uses digital technology to record memories, photographs, and other media related to the Hurricanes Katrina, Rita, and Wilma. (CNN is going to feature the project sometime this week on its program The Situation Room.) The HDMB is a democratic historical project similar to our September 11 Digital Archive, which saved the recollections and digital files of tens of thousands of contributors from around the world; this time we’re trying to save thousands of perspectives on what occurred on the Gulf Coast in the fall of 2005. What amazes me is how the interest in online historical projects and collections has exploded recently. Several of the web projects I’ve co-directed over the last five years have engaged in collecting history online. But even a project with as prominent a topic as September 11 took a long time to be picked up by the mass media. This time CNN called us just a few weeks after we launched the website, and before we’ve done any real publicity. Here are three developments from the last two years I think account for this sharply increased interest.

Technologies enabling popular writing (blogs) and image sharing (e.g., Flickr) have moved into the mainstream, creating an unprecedented wave of self-documentation and historicizing. Blogs, of course, have given millions of people a taste for daily or weekly self-documentation unseen since the height of diary use in the late nineteenth century. And it used to be fairly complicated to set up an online gallery of one’s photos. Now you can do it with no technical know-how whatsoever, and it’s become much easier for others to find these photos (partly due to tagging/folksonomies). The result is that millions of photographs are being shared daily and the general public is getting used to the instantaneous documentation of events. Look at what happened in the hours after the London subway bombings— photographic documentation of the event that took place on photo-sharing sites within two days formerly would have taken months or even years for archivists to compile.

New web services are making combinations of these democratic efforts at documentation feasible and compelling. Our big innovation for the HDMB is to locate each contribution on an interactive map (using the Google Maps API), which allows one to compare the experiences and images from one place (e.g. an impoverished parish in New Orleans) with another (e.g., a wealthier suburb of Baton Rouge). (Can someone please come up with a better word for these combinations than the current “mashups”?) Through the savvy use of unique Technorati or Flickr tags, a scattered group of friends or colleagues can now automatically associate a group of documents or photographs to create an instant collection on an event or issue.

The mass media has almost completely reversed its formerly antagonistic posture toward new media. CNN now has at least two dedicated “Internet reporters” who look for new websites and scan blogs for news and commentary—once disparaged as the last refuge of unpublishable amateurs. In the last year the blogosphere has actually broken several stories (e.g., the Dan Rather document scandal), and many journalists have started their own blogs. The Washington Post has just hired its first full-time blogger. Technorati now tracks over 24 million blogs; even if 99% of those are discussing the latest on TomKat (the celebrity marriage) or Tomcat (the Linux server technology for Java), there are still a lot of new, interesting perspectives out there to be recorded for posterity.