How “great” is this article? « Infomotions Mini-Musings

How “great” is this article?

During Digital Humanities 2010 I participated in the THATCamp London Developers’ Challenge and tried to answer the question, “How ‘great’ is this article?” This posting outlines the functionality of my submission, links to a screen capture demonstrating it, and provides access to the source code.

Given any text file — say an article from the English Women’s Journal — my submission tries to answer the question, “How ‘great’ is this article?” It does this by:

returning the most common words in a text
returning the most common bigrams in a text
calculating a few readability scores
comparing the texts to a standardized set of “great ideas”
supporting a concordance for browsing

Functions #1, #2, #3, and #5 are relatively straight-forward and well-understood. Function #4 needs some explanation.

In the 1960’s a set of books was published called the Great Books. The set is based on a set of 102 “great ideas” (such as art, love, honor, truth, justice, wisdom, science, etc.). By summing the TFIDF scores of each of these ideas for each of the books, a “great ideas coefficient” can be computed. Through this process we find that Shakespeare wrote seven of the top ten books when it comes to love. Kant wrote the “greatest book”. The American State’s Articles of Confederation ranks the highest when it come to war. This “coefficient” can then be used as a standard — an index — for comparing other documents. This is exactly what this program does. (See the screen capture for a demonstration.)

The program can be improved a number of ways:

it could be Web-based
it could process non-text files
it could graphically illustrate a text’s “greatness”
it could hyperlink returned words directly to the concordance

Thanks to Gerhard Brey and the folks of the Nineteenth Century Serials Editions for providing the data. Very interesting.

Tags: dh2010, digital humanities, thatcamp

This entry was posted on Friday, July 9th, 2010 at 3:33 am and is filed under Hacks. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

3 Responses to “How “great” is this article?”

Király Péter says:

July 14, 2010 at 4:03 am

Hi Eric,

very interesting post and demonstration. One suggestion: maybe you should use a synonym dictionary (and a formal taxanomy) to transform those words in the text which has synomys for the words in the __DATA__ section. Eg. algebra is a child concept of mathematics, but if the text won’t mention math, just algebra, the text won’t match the idea.

Regards,
Péter
Eric Lease Morgan says:

July 14, 2010 at 6:48 am

Péter, yes, you are exactly correct, and in the one of the next iterations of my informal Great Books/Ideas Project I plan to implement exactly the sort of thing you suggest. Thank you.

—
ELM
Digital Humanities 2010: A Travelogue « Infomotions Mini-Musings says:

July 25, 2010 at 12:52 pm

[…] How “great” is this article? […]

Creator: Eric Lease Morgan <eric_morgan@infomotions.com>
Date created: 2008-05-26
Date updated: 2010-05-09
URL: ./

Archives

Categories

How “great” is this article?

3 Responses to “How “great” is this article?”