Building the "next generation" library catalog

How will we, the library profession, build the "next generation" library catalog, and to what degree will the process include vendor support and open source software?

I must admit that there are few things that do not succeed over time without some sort of commercial interest. Think OCLC. JSTOR. Even NOTIS. The only exception to the rule seems to be when government subsidizes the process.

Be that as it may, I will still advocate a large dose of grass roots efforts lead by the library community exploiting open source software over something created by a commercial institution. At least for now. Moreover, when your fellow librarians say things like, "We tried those 'homegrown' systems a long time ago, and where are they now? We need vendor-supported software", I can give you a number of reasons why this is not necessarily the case in today's environment:

Computer hardware & software - Twenty or more years ago, when the library profession was supporting "homegrown" systems, the hardware used was vendor-specific. Maybe you had a Prime. A Unisys. An IBM 360. A DEC Watchamacallit. A Sun Something. Etc. These computers had less RAM, less disk space, and less processing power than the computer you have on your desktop right now. Each of these computers had their own operating system and set of programming languages used to create applications. The applications created for these systems was not sharable between computers, and consequently is was difficult, if not impossible, to share code between libraries. Now-a-days the applications will be written for Unix/Linux or Java -- platforms that are not computer hardware specific. (If someone creates a Microsoft-based "next generation" library catalog, run the other way, very fast.) The code written for one computer will run on the next computer (no puns intended) without much modification, and this will enable the library community to collaborate to a greater degree.
Relational databases - Relational databases and the technology used to implement them was embryonic when libraries were supporting their "homegrown" systems. There were few, if any, well-supported best practices for managing large sets of information. And even when you did you sat around worrying whether or not you should allocate two bytes of disk space to denote the name of a state or twelve. These problems are far less challenging now with the cost of disk space and the availability of any number of relational database applications. The problems of storing the data is much less limiting than it was twenty years ago.
Indexing technology - Databases are great for storing and manipulating information. Ironically, they are poor on searching. To search a database you must know the underlying structure of the data. Indexes remove this problem. They invert the content of the database creating lists of words and pointers to records. No knowledge of the database's structure is necessary. Couple this with statistical analysis and indexing technology begins to appear "smart" -- think relevance ranking. Indexing technology has matured to a very large degree in the past twenty years, and there are a large number of freely available indexers. How many indexers were available twenty years ago? One, maybe. BRS.
Skills - Computers twenty or more years ago were expensive, very expensive. Much fewer people had access to computers and a proportional number of fewer people had computer expertise. Now-a-days hackers abound. [1] If they didn't we wouldn't have the email, Web servers, MySQL, Perl, PHP, Linux, or just about anything related to the Internet. Put another way, there are many many more people now-a-days who know how to make computers do the things they do. There are computer programmers around, they just don't work in libraries to a large degree. "Libraries are about books. Right?"
Communication - Communication via the telephone is dirt cheap. You can make long distance telephone calls for pennies. From my workplace here in Indiana I can talk on the telephone with people in the United Kingdom for .02¢/minute. At those rates it is silly not to pick up the telephone. The biggest thing the Internet does is facilitate communication. People-to-people communication. People-to-computer communication. Computer-to-computer communication. Twenty years ago the story was much different. You were lucky to have a 2400 baud modem and you dared not make a long-distance telephone call. Because of our increasingly seamless ability to communicate across long distances, it will be easier for libraries to coordinate their effort and create something from the community.

In short, don't let people write you off when you say, "We can built it ourselves." Explain to them how the computer environment is substantially different from previous times. Enumerate the things outlined above. Yes, the human challenges still exist. Building consensus. Setting priorities. Keeping things on schedule. Creating communities. Bringing people physically together. Allocating time, space, people, and money. But are those the things you want to pay a vendor for? The other things are "as free as a free kitten."

Food for thought on a Friday afternoon.

[1] Hackers in this context are contrasted with "crackers". Hackers are the good guys. They look at source code and figure out ways to improve it or modify it for their own purposes. Crackers, on the other hand, are malicious. They look for ways to exploit software for immoral purposes.

Creator: Eric Lease Morgan <eric_morgan@infomotions.com>
Source: This was originally a blog posting on the LITA blog at http://litablog.org/2006/09/01/building-the-next-generation-library-catalog/.
Date created: 2006-09-01
Date updated: 2007-12-28
Subject(s): next generation library catalogs;
URL: http://infomotions.com/musings/building-ngc/