Audience RSS Stats

Measuring the Audience of a Digital Humanities Project

Karen Motylewski of the Institute of Museum and Library Services recently pressed an audience of recent IMLS grantees to think about how they might measure the success of their digital projects. As she was well aware, academics often bristle at the quantitative measurement of the audience for their websites because it smacks of commercialism. Also, we professors and librarians and curators generally avoid taking classes in such base topics as marketing. But Karen has a point. Indeed, Roy Rosenzweig and I devote an entire chapter in Digital History to how to build an audience—not for commercial or narcissistic reasons, but because an academic digital project should be, as we say, “useful and used.” I started this blog to explain in greater depth some of the projects and research I’m working on in the digital humanities, but I also did it (as readers of my five-part series on “Creating a Blog from Scratch” will know; 1, 2, 3, 4, 5) to learn first-hand about the composition of blogs and the technologies behind them. Writing my own code for this blog forced me to examine in detail—and occasionally rethink—some blogging conventions (technical, design, and content). And one of the benefits of doing so has been a realization that I have significantly underestimated the power of RSS. I now think it may be the best measurement of utility for an academic website, far better than server logs or other quantitative measurements. Let me explain why.

Think of your reading habits—specifically, periodicals. You probably subscribe to a newspaper, a magazine or two (or three), and perhaps some academic or specialist journals. Every time you go to the dentist, you also probably voraciously read all of those salacious magazines and lifestyle handbooks you don’t subscribe to. If you’re in a particularly bad waiting room, you probably read anything that’s lying around, even if you would never buy those magazines at a newstand. As anyone in the magazine or newspaper business will tell you, what they really want is subscribers, not casual, one-time readers. Subscribers have shown a level of interest in, and dedication to, a periodical that is several levels above all other readers.

Now look carefully at web server logs—the trail of a website’s readers. Most visitors to a typical website are like the third type of magazine reader—simply passing through on the way to get their cavities filled. They generally come from search engines, quickly scan a page, and leave, their IP address never to be seen again.

Moreover, up to three-quarters of traffic to most websites is from bots (i.e., Google’s indexing spider)—a machine audience that you probably care little about, except as a way to drive traffic to your site from search requests. On this site in March 2006, the human audience looked at about 10,000 pages; machines requested over 26,000 pages. This doesn’t even take into account “server spam,” which consists of fake requests to your server to make it look like another website is sending a lot of traffic your way. In March, was the number one “referrer” to this blog. Great.

So now we are down to about 10% of the top line number of “visitors” to your website. You are likely getting depressed. But here’s where another point Roy and I make comes into play: “think about community, not numbers of visitors.” That other 10% includes a number of people who love your site and what it has to offer but only visit every once in a while.

Then there are the subscribers. RSS truly provides an online analog to periodical subscriptions; “subscriptions” is a very good word for it since subscribers receive each update automatically. RSS finally allows digital humanities projects to assess how many people are really committed to a site. Notably, this number may or may not follow overall site traffic patterns. For instance, here’s a comparison of server logs for this site with RSS subscriptions:

In the noise of all of the bot traffic and disinterested visitors (top chart; the orange bar represents unique visitors, the dark blue is page views), I’m grateful that subscriptions to this blog (bottom chart) have climbed steadily since its inception four months ago. Should this blog have the enormous traffic of a BoingBoing? No. That’s not why I started it. I’m trying to reach a fairly specific audience that is several orders of magnitude smaller than the big tech/geek audience for BoingBoing. Success means reaching and having a conversation with those people—the people who I believe are doing critical work for the future of education, libraries, and the humanities—not with a mass audience. I hope this site is slowly creeping toward that modest goal. By tracking RSS subscriptions, other digital humanities projects can also see if they’re reaching their envisioned audience.

But how do you use RSS if your site isn’t a blog? If your site is a digital collection or archive, you can add a “news about this site” or “new features/new additions” RSS feed, as we have done for the Hurricane Digital Memory Bank. If your project involves software development, you can put code update announcements into an RSS feed. Even if your site is relatively static, new services such as will send out notifications of site changes to interested parties. Once you have an RSS feed (you should link to it from your home page so that RSS-aware browsers can find it quickly), you can then use services such as Feedburner to track RSS subscriptions more carefully.

With all of its faults and problems, I suspect we will soon be saying, “The server log is dead.” Long live RSS.

Blogs Programming RSS Software

Creating a Blog from Scratch, Part 3: The Double Life of Blogs

In the first two posts in this series, I discussed the origins of blogs and how they led to certain elements in popular blog software that were in some cases good and in others bad for my own purposes—to start a blog that consisted of short articles on the intersection of digital technology, the humanities, and related topics (rather than my personal life or links with commentary). What I didn’t realize as I set about writing my own blog software from scratch for this project was that in truth a blog leads two lives: one as a website and another as a feed, or syndicated digest of the more complete website. Understanding this double life and the role and details of RSS feeds led to further thoughts about how to design a blog, and how certain choices are encoded into blogging software. Those choices, as I’ll explain in this post, determine to a large extent what kind of blog you are writing.

Creating a blog from scratch is a great first project for someone new to web programming and databases. It involves putting things into a database (writing a post), taking them out to insert into a web page, and perhaps coding some secondary scripts such as a search engine or a feed generator. It stretches all of the scripting muscles without pulling them. And in my case, I had already decided (as discussed in the prior post) I didn’t need (or want) a lot of bells and whistles, like a system for allowing comments or trackback/ping features. So I began to write my blogging application with the assumption that it would be a very straightforward affair.

Indeed, at first designing the database couldn’t have been simpler. The columns (as database fields are called in MySQL) were naturally an ID number, a title, the body of the post (the text you’re reading right now), and the time posted (while I disparaged time in my last post I thought it was important enough to have this field for reference). But then I began to consider two things that were critical and that I thought popular blogging software did well: automatically generating an RSS feed (or feeds) and—for some blog software—creating URLs for each post that nicely contain keywords from the title. This latter feature is very important for visibility in search engines (the topic of my next post in this series).

I know a lot about search engines, but knew very little about RSS feeds, and when I started to think about what my blogging application would need to autogenerate its own feed, I realized the database had to be slightly more elaborate than my original schema. I had of course heard of RSS before, and the idea of syndication. But before I began to think about this blog and write the software that drives it I hadn’t really understood its significance or its complexity. It’s supposed to be “Really Simple Syndication” as one of the definitions for the acronym RSS asserts, and indeed the XML schemas for various RSS feeds are relatively simple.

But they also contain certain assumptions. For instance, all RSS feeds have a description of each blog post (in the Atom feed, it’s called the “summary”), but these descriptions vary from feed to feed and among different RSS feed types. For some it is the first paragraph of the post, for others the first 100 characters, and for still others it’s a specially crafted “teaser” (to use the TV news lingo for lines like “Coming up, what every parent needs to know to prevent a dingo from eating their baby”). Along with the title to the post this snippet is generally the only thing many people see—the people who don’t visit your blog’s website but only scan it in syndication.

So what kind of description did I want, and how to create it? I liked the idea of an autogenerated snippet—press “post” and you’re done. On the other hand, choosing a random number of characters out of a hat that would work well for every post seemed silly. I’m used to writing abstracts for publications, so that was another possibility. But then I would have to do even more work after I finished writing a post. And I also needed to factor in that the description had to prod people to look at the whole post on my website, something the “teaser” people understood. So I decided to compromise and leave part to the code and part to me: I would have the software take the first paragraph of the entire post and use that as the default summary for the feed, but I had the software put a copy of this paragraph in the database so I could edit it if I wanted to. Automated but alterable. And I also decided that I needed to change my normal academic writing style to have more of an enticing end to every first paragraph. The first paragraph would be somewhere between an abstract and a teaser.

Having made this decision and others like it for the other fields in the feed (should I have “channels”? why does there have to be an “author” field in a feed for a solo blog?), I had to choose which feed to create: RSS 1.0, RSS 2.0, Atom? Why there are so many feed types somewhat eludes me; I find it weird to go to some blogs that have little “chicklets” for every feed under the sun. I’ve now read a lot about the history of RSS but it seems like something that should have become a single international standard early on. Anyway, I wanted the maximum “subscribability” for my blog but didn’t want to suffer writing a PHP script to generate every single kind of feed.

So, time for outsourcing. I created a single script to generate an Atom 1.0 feed (I think the best choice if you’re just starting out), which is picked up by FeedBurner and recast into all of those slightly incompatible other feed types depending on what the subscriber needs.

This would not be the first time I would get lazy and outsource. In the next part of this series, I discuss the different ways one can provide search for a blog, and why I’m currently letting Google do the heavy lifting.

Part 4: Searching for a Good Search