Enhancing Historical Research With Text-Mining and Analysis Tools

Open Book I’m delighted to announce that beginning this summer the Center for History and New Media will undertake a major two-year study of the potential of text-mining tools for historical (and by extension, humanities) scholarship. The project, entitled “Scholarship in the Age of Abundance: Enhancing Historical Research With Text-Mining and Analysis Tools,” has just received generous funding from the National Endowment for the Humanities.

In the last decade the library community and other providers of digital collections have created an incredibly rich digital archive of historical and cultural materials. Yet most scholars have not yet figured out ways to take full advantage of the digitized riches suddenly available on their computers. Indeed, the abundance of digital documents has actually exacerbated the problems of some researchers who now find themselves overwhelmed by the sheer quantity of available material. Meanwhile, some of the most profound insights lurking in these digital corpora remain locked up.

For some time computer scientists have been pursuing text mining as a solution to the problem of abundance, and there have even been a few attempts at bringing text-mining tools to the humanities (such as the MONK project). Yet there is not as much research as one might hope on what non-technically savvy scholars (especially historians) might actually want and use in their research, and how we might integrate sophisticated text analysis into the workflow of these scholars.

We will first conduct a survey of historians to examine closely their use of digital resources and prospect for particularly helpful uses of digital technology. We will then explore three main areas where text mining might help in the research process: locating documents of interest in the sea of texts online; extracting and synthesizing information from these texts; and analyzing large-scale patterns across these texts. A focus group of historians will be used to assess the efficacy of different methods of text mining and analysis in real-world research situations in order to offer recommendations, and even some tools, for the most promising approaches.

In addition to other forms of dissemination, I will of course provide project updates in this space.

[Image credit: Matt Wright]

February 4, 2008 11 Comments

In History, Research, Text Mining, Tools

Comments

PhDinHistory says:

February 5, 2008 at 1:32 am

That’s awesome. I had hoped this day would come. If you can develop software that can understand texts and find meaning in them, as opposed to just extracting and manipulating their information, I think you will really be onto something. But that may require revisiting our debate about the semantic web.

Steve Ramsay says:

February 5, 2008 at 2:13 pm

Congratulations, on behalf of the MONK Project! The more the merrier, we say. 😉

Heather Munro Prescott says:

February 6, 2008 at 8:30 pm

This sounds fascinating. Sign me up!

schrattenkalk.com says:

March 15, 2008 at 4:26 pm

[…] for history (as a subject) and historians in general. Also on history, but more technical, an article by Dan Cohen on the research on tools for researchers. There is not only blogs, but also podcasts. […]

George Grubbs says:

June 12, 2008 at 4:20 pm

I have been planning to do the same thing that you’re already doing. I definitely would like to participate in any way that I can. I am a computer scientist at RTI International specializing in databases, data warehouses and data/text mining – particularly in bioinformatics. I am very interesting in the potential of applying text mining to the analysis of large numbers of documents of all types. Keep up the good work.

edwired » Blog Archive » Visualization as an Introduction to Text Mining? says:

September 23, 2008 at 9:44 am

[…] days are gone, it’s up to us to teach our students some new techniques. But where to start? Text mining–the new big thing in digital humanities–is a relatively higher order skill. Should […]

Shai OPhir says:

November 16, 2008 at 4:18 pm

great news!
Meanwhile, until this project can be download.. can someone (Steve Ramsay?) tell me where the MONK project has gone? Why their site is empty? Can I download the MONK or the NORA project and get it run on a standard PC?
Many thanks,
Shai

What I Would Like To See in Text Mining for Historians « Clio Machine says:

January 29, 2009 at 10:32 pm

[…] embark on a two-year project to adapt the technology of text mining to the work of historians (see here and here). Often, projections of what text mining can do for historians have left readers with the […]

Stephen Ramsay » Anthologize It says:

August 12, 2010 at 11:45 am

[…] editors, editorial boards, and “outside” reviewers remain the best way to manage what some have called the “Age of Abundance?” Is it really better to have a restricted set of […]

edwired » Blog Archive » The History Curriculum in 2023 (Mining) says:

January 3, 2013 at 8:48 am

[…] sectors of the information economy and historians and other humanists have already begun working on exciting projects [see also and also] that are helping us find ways to mine emerging super massive datasets of […]

Las armas de Corocotta | Trifinium says:

October 31, 2014 at 7:20 am

[…] Dan Cohen (Center for History and New Media, George Mason University) […]

Enhancing Historical Research With Text-Mining and Analysis Tools

Comments

Leave a Reply Cancel reply