Here’s a possible conundrum worthy of the New York Times’s ethicist, Randy Cohen (no relation to your’s truly). I have been a major proponent of reCAPTCHA, the red and yellow box at the bottom of my blog posts that uses words from books scanned by the Internet Archive/Open Content Alliance as a system to prevent comment spam. At the same time visitors decipher the words in that box to add a comment, they help to turn old texts into accurate, useful transcriptions. My glee about killing two birds with one stone has soured a bit after discovering something unsettling: I still get comment spam on my blog, and a lot of it–thousands and thousands of bogus comments.
My investigation of these comments–checking IP addresses, looking at patterns of posting and the links therein, and other discussions of how solid reCAPTCHA’s technology is (e.g., it doesn’t seem susceptible to a “relay attack,” where a puzzle is redirected by the spammer to a unsuspecting person logging onto another site)–leads me to the depressing conclusion that these comments are not done by bots or unwitting third parties. Rather, they are added by hand, one at a time, intentionally. Real human beings are figuring out the blurry words from those old books to insert vaguely plausible comments (“Nice post! Check out my site for more on the same topic.”).
I suppose it’s good news that the spammers are being used as human OCR. By my calculations they’ve decoded, word by word, about 50 pages of text on my blog alone. (Real commenters have transcribed about a half a page.) But I suspect–and would be happy to be proven wrong in real comments, below–that many of the actual people solving the reCAPTCHA are being paid pennies an hour by spam overlords to boost the Google rankings of their clients by adding keyword-rich linked comments to sites with high PageRank.
So in a sense, reCAPTCHA leads to a kind of indirect outsourcing similar to sending a book to be “rekeyed” by low-paid, third-world typists.