Counting words

When I talk about “services against text” I usually get blank stares from people. When I think about it more, many of the services I enumerate are based on the counting of words. Consequently, I spent some time doing just that — counting words.

I wanted to analyze the content of a couple of the mailing lists I own/moderate, specifically Code4Lib and NGC4Lib. Who are the most frequent posters? What words are used most often in the subject lines, and what words are used most often in the body of the messages? Using a hack I wrote (mine-mail.pl) I was able to generate simple tables of data:

I then fed these tables to Wordle to create cool looking images. I also fed these tables to a second hack (dat2cloud.pl) to create not-even-close-to-valid HTML files in the form of hyperlinked tag clouds. Below is are the fruits of these efforts:


image of names

tag cloud of names

image of subjects

tag cloud of subjects

image of words

tag cloud of words

The next step is to plot the simple tables on a Cartesian plane. In other words, graph the data. Wish me luck.

Tags:

Comments are closed.