Archive for February, 2009

Henry David Thoreau’s Walden

Monday, February 9th, 2009

As I sit here beside my fire at the cabin, I reflect on the experiences documented by Henry David Thoreau in his book entitled Walden.

Being human

On one level, the book is about a man who goes off to live in a small cabin by a pond named Walden. It describes how be built his home, tended his garden, and walked through the woods. On another level, it is collection of self-observations and reflections on what it means to be human. “I went to the woods because I wished to live deliberately, to front only the essential facts of life, and see if I could not learn what it has to teach, and not, when I came to die, discover that I had not lived… I wanted to live deep and suck out all the marrow of life, to live so sturdily and Spartan-like as to put to rout all that was not life, to cut a broad swath and shave close, to drive life into a corner, and reduce it to its lowest terms, and, if it proved to be mean, why then to get the whole and genuine meanness of it, and publish its meanness to the world.”

Selected chapters

The book doesn’t really have beginning, a middle, and an end. There is no hero, no protagonist, no conflict, and no climax. Instead, the book is made up of little stories amassed over the period of one and a half years while living alone. Economy — an outline of the necessities of life such as clothing, shelter, and food. It cost him $28 to built his cabin, and he grew much of his own food. “Yet men have come to such a pass that they frequently starve not for want of necessities, but for want of luxuries.”

I also enjoyed the chapter called “The Bean-Field”. “I have come to love my rows, my beans, though so many more than I wanted.” Apparently he had as many as seven miles of beans, if they were all strung in a row. Even over two acres of ground, I find that hard to believe. He mentions woodchucks often in the chapter as well as throuhout the book, and he dislikes them because they eat his crop. I always thought woodchucks — ground hogs — were particularly interesting since they were abundant around the property where I grew up. In relation to economy, Thoreau spent just less than $14 on gardening expenses, and after selling his crop made a profit of almost $9. “Daily the beans saw me come to their rescue armed with a hoe, and thin their ranks of the enemies, filling up the trenches with weedy dead.”

The chapter called “Sounds” is full of them or allusions to them: voice, rattle, whistle, scream, shout, ring, announce, hissing, bells, sung, lowing, serenaded, music chanted, cluck, buzzing, screech, wailing, trilled, sighs, hymns, threnodies, gurgling, hooting, baying, trump, bellowing, crow, bark, laughing, cackle, creaking, and snapped. Almost a cacophony, but at the same time a possible symphony. It depends on your perspective.

While he lived alone, he was never seemingly lonely. In fact, he seemed to attract visitors or sought them out himself. Consider the wood chopper who was extra skilled at this job. Reflect on the Irish family who lived “rudely”. Compare and contrast the well-to-do professional with manners to the man who lived in a hollow log. (I wonder whether or not that second man really existed.)

Thoreau’s description of the pond itself were arscient. [1] He describes its color, its depth, and over all size. He ponders where it got its name, its relation to surrounding ponds, and where its water comes from and goes. He fishes in it regularly, and walk upon its ice in the winter. He describes how men harvest its ice and how the pond keeps most of the effort. He appreciates the appearance of the pond as he observes it during different times of year as well as from different vantage points. In my mind, it is a good thing to observe anything and just about everything from many points of view, both literally and figuratively.

Conclusion

The concluding chapter has a number of meaty thoughts. “I left the woods for a good a reason as I went there. Perhaps it seemed to me that I had several more lives to live, and could not spare any more time for that one… I learned this, at least, by my experiment: that if one advances confidently in the direction of his dreams, and endeavors to live the life which he has imagined, he will meet with a success unexpected in common hours… If a man does not keep pace with his companions, perhaps it is because he hears a different drummer. Let him step to the music which he hears, however measured or far away… However mean your life is, meet it and live it; do not shun it and call it hard names… Love your life, poor as it is… Rather than love, than money, than fame, give me truth.”

Word cloud

As a service against the text, and as a means to learning about it more quickly, I give you the following word cloud (think “concordance”) complete with links to the places in the text where the words can be found:

life  pond  most  house  day  though  water  many  time  never  about  woods  without  much  yet  long  see  before  first  new  ice  well  down  little  off  know  own  old  nor  good  part  winter  far  way  being  last  after  heard  live  great  world  again  nature  shore  morning  think  work  once  same  walden  thought  feet  spring  earth  here  perhaps  night  side  sun  things  surface  few  thus  find  found  summer  must  true  got  also  years  village  enough  myself  half  poor  seen  air  better  put  read  till  small  within  wood  cannot  fire  ground  deep  end  bottom  left  nothing  went  away  place  almost  least  

Note

[1] Arscience — art-science — is a term I use to describe a way of thinking incorporating both artistic and scientific elements. Arscient thinking is poetic, intuitive, free-flowing, and at the same time it is systematic, structured, and repeatable. To my mind, a person requires both in order to create a cosmos from the apparent chaos of our surroundings.

Eric Lease Morgan’s Top Tech Trends for ALA Mid-Winter, 2009

Monday, February 9th, 2009

This is a list of “top technology trends” written for ALA Mid-Winter, 2009. They are presented in no particular order. [This text was originally published on the LITA Blog, but it is duplicated here because “lot’s of copies keep stuff safe.” –ELM]

Indexing with Solr/Lucene works well – Lucene seems to have become the gold standard when it comes to open source indexer/search engine platforms. Solr — a Web Services interface to Lucene — is increasingly the preferred way to read & write Lucene indexes. Librarians love to create lists. Books. Journals. Articles. Movies. Authoritative names and subjects. Websites. Etc. All of these lists beg for the organization. Thus, (relational) databases. But Lists need to be short, easily sortable, and/or searchable in order to be useful as finding aids. Indexers make things searchable, not databases. The library profession needs to get its head around the creation of indexes. The Solr/Lucene combination is a good place to start — er, catch up.

Linked data is a new name for the Semantic Web – The Semantic Web is about creating conceptual relationships between things found on the Internet. Believe it or not, the idea is akin to the ultimate purpose of a traditional library card catalog. Have an item in hand. Give it a unique identifier. Systematically describe it. Put all the descriptions in one place and allow people to navigate the space. By following the tracings it is possible to move from one manifestation of an idea to another ultimately providing the means to the discovery, combination, and creation of new ideas. The Semantic Web is almost the exactly the same thing except the “cards” are manifested using RDF/XML on computers through the Internet. From the beginning RDF has gotten a bad name. “Too difficult to implement, and besides the Semantic Web is a thing of science fiction.” Recently the term “linked data” has been used to denote the same process of creating conceptual relationships between things on the ‘Net. It is the Semantic Web by a different name. There is still hope.

Blogging is peaking – There is no doubt about it. The Blogosphere is here to stay, yet people have discovered that it is not very easy to maintain a blog for the long haul. The technology has made it easier to compose and distribute one’s ideas, much to the chagrin of newspaper publishers. On the other hand, the really hard work is coming up with meaningful things to say on a regular basis. People have figured this out, and consequently many blogs have gone by the wayside. In fact, I’d be willing to bet that the number of new blogs is decreasing, and the number of postings to existing blogs is decreasing as well. Blogging was “kewl” is cool but also hard work. Blogging is peaking. And by the way, I dislike those blogs which are only partial syndicated. They allow you to read the first 256 characters or so of and entry, and then encourage you to go to their home site to read the whole story whereby you are bombarded with loads of advertising.

Word/tag clouds abound – It seems very fashionable to create word/tag clouds now-a-days. When you get right down to it, word/tag clouds are a whole lot like concordances — one of the first types of indexes. Each word (or tag) in a document is itemized and counted. Stop words are removed, and the results are sorted either alphabetically or numerically by count. This process — especially if it were applied to significant phrases — could be a very effective and visual way to describe the “aboutness” of a file (electronic book, article, mailing list archive, etc.). An advanced feature is to hyperlink each word, tag, or phrase to specific locations in the file. Given a set of files on similar themes, it might be interesting to create word/tag clouds against them in order to compare and contrast. Hmmm…

“Next Generation” library catalogs seem to be defined – From my perspective, the profession has stopped asking questions about the definition of “next generation” library catalogs. I base this statement on two things. First, the number of postings and discussion on a mailing list called NGC4Lib has dwindled. There are fewer questions and even less discussion. Second, the applications touting themselves, more or less, as “next generation” library catalog systems all have similar architectures. Ingest content from various sources. Normalize it into an internal data structure. Store the normalized data. Index the normalized data. Provide access to the index as well as services against the index such as tag, review, and Did You Mean? All of this is nice, but it really isn’t very “next generation”. Instead it is slightly more of the same. An index allows people to find, but people are still drinking from the proverbial fire hose. Anybody can find. In my opinion, the current definition of “next generation” does not go far enough. Library catalogs need to provide an increased number services against the content, not just services against the index. Compare & contrast. Do morphology against. Create word cloud from. Translate. Transform. Buy. Review. Discuss. Share. Preserve. Duplicate. Trace idea, citation, and/or author forwards & backwards. It is time to go beyond novel ways to search lists.

SRU is becoming more viable – SRU (Search/Retrieve via URL) is a Web Services-based protocol for searching databases/indexes. Send a specifically shaped URL to a remote HTTP server. Get back a specifically shaped response. SRU has been joined with a no-longer competing standard called OpenSearch in the form of an Abstract Protocol Definition, and the whole is on its way to becoming an OASIS standard. Just as importantly, an increasing number of the APIs supporting the external-facing OCLC Grid Services (WorldCat, Identities, Registries, Terminologies, Metadata Crosswalk) use SRU as the query interface. SRU has many advantages, but some of those advantages are also disadvantages. For example, its query language (CQL) is expressive, especially compared to OpenSearch or Google, but at the same time, it is not easy to implement. Second, the nature of SRU responses can range from rudimentary and simple to obtuse and complicated. More over, the response is always in XML. These factors make transforming the response for human consumption sometimes difficult to implement. Despite all these things, I think SRU is a step in the right direction.

The pendulum of data ownership is swinging – I believe it was Francis Bacon who said, “Knowledge is power”. In my epistemological cosmology, knowledge is based on information, and information is based on data. (Going the other way, knowledge leads to wisdom, but that is another essay.) Therefore, he who owns or has access to the data will ultimately have more power. Google increasingly has more data than just about anybody. They have a lot of power. OCLC increasingly “owns” the bibliographic data created by its membership. Ironically, this data — in both the case of Google and OCLC — is not freely available, even when the data was created for the benefit of the wider whole. I see this movement akin to the movement of a pendulum swinging one way and then the other. On my more pessimistic days I view it as a battle. On my calmer days I see it as a natural tendency, a give and take. Many librarians I know are in the profession, not for the money, but to support some sort of cause. Intellectual freedom. The right to read. Diversity. Preservation of the historical record. If I have a cause it then is about the free and equal access to information. This is why I advocate open access publishing, open source software, and Net Neutrality. When data and information is “owned” and “sold” an environment of information have’s and have not’s manifests itself. Ultimately, this leads to individual gain but not necessarily the improvement of the human condition as a whole.

The Digital Dark Age continues – We, as a society, are continuing to create a Digital Dark Age. Considering all of the aspects of librarianship, the folks who deal with preservation, conservation, and archives have the toughest row to hoe. It is ironic. On one hand there is more data and information available than just about anybody knows what to do with. On the other hand, much of this data and information will not be readable, let alone available, in the foreseeable future. Somebody is going to want to do research on the use of blogs and email. What libraries are archiving this data? We are writing reports and summaries in binary and proprietary formats. Such things are akin to music distributed on 8-track tapes. Where are the gizmos enabling us to read these formats? We increasingly license our most desired content — scholarly journal articles — and in the end we don’t own anything. With the advent of Project Gutenberg, Google Books, and the Open Content Alliance the numbers of freely available electronic books rival the collections of many academic libraries. Who is collecting these things? Do we really want to put all of our eggs into one basket and trust these entities to keep them for the long haul? The HathiTrust understand this phenomonon, and “Lot’s of copies keep stuff safe.” Good. In the current environment of networked information, we need to re-articulate the definition of “collection”.

Finally, regarding change. It manifests itself along a continuum. At one end is evolution. Slow. Many false starts. Incremental. At the other end is revolution. Fast. Violent. Decisive. Institutions and their behaviors change slowly. Otherwise they wouldn’t be the same institutions. Librarianship is an institution. Its behavior changes slowly. This is to be expected.

YAAC: Yet Another Alex Catalogue

Monday, February 2nd, 2009

I have implemented another version of my Alex Catalogue of Electronic Texts, more specifically, I have dropped the use of one indexer and replaced it with Solr/Lucene. See http://infomotions.com/alex/ This particular implementation does not have all the features of the previous one. No spell check. No thesaurus. No query suggestions. On the other hand, it does support paging, and since it runs under mod_perl, it is quite responsive.

As always I am working on the next version, and you can see where I’m going at http://infomotions.com/sandbox/alex4/ Like the implementation above, this one runs under mod_perl and supports paging. Unlike the implementation above, it also supports query suggestions, a thesaurus, and faceted browsing. It also sports the means to view metadata details. Content-wise, it included images, journal titles, journal articles, and some content from the HathiTrust.

It would be great if I were to get some feedback regarding these implementations. Are they easy to use?

ISBN numbers

Monday, February 2nd, 2009

I’m beginning to think about ISBN numbers and the Alex Catalogue of Electronic Texts. For example, I can add ISBN numbers to Alex, link them to my (fledgling) LibraryThing collection, and display lists of recently added items here:

Interesting, but I think the list will change over time, as new things get added to my collection. It would be nice to link to a specific item. Hmm…

[openbook booknumber=”9781593082277″] On the other hand, I could exploit ISBN numbers and OpenLibrary using a WordPress plug-in called OpenBook Book Data by John Miedema. It displays cover art, a link to OpenLibrary as well as WorldCat

Again, very interesting. For more details, see the “OpenBook WordPress Plugin: Open Source Access to Bibliographic Data” in Code4Lib Journal.

A while ago I wrote a CGI script that took ISBN numbers as input, fed them to xISBN and/or ThingISBN to suggest alternative titles. I called it Send It To Me.

Then of course there is the direct link to Amazon.com.

I suppose it is nice to have choice.