Open source software in libraries

Introduction

This is an essay about open source software and libraries. It outlines what open source software is and is not. It discusses its relationships to the integrated library system. It compares open source software to open access journals and the evolutionary shift academe is experiencing in the world of scholarly communication. Finally, it very briefly reviews select pieces of open source software and describes how they can be used in libraries.

Open source software as a philosophy and a process

Open Source Software (OSS) is both a philosophy and a process. As a philosophy it describes the intended use of software and methods for its distribution. Depending on your perspective, the concept of OSS is a relatively new idea, being only four or five years old. On the other hand, the GNU Software Project -- a project advocating the distribution of "free" software -- has been operational since the mid '80s. Consequently, the ideas behind OSS may have been around longer than you think.

It begins when a man named Richard Stallman who worked for MIT in an environment where software was shared. In the mid 1980s, Stallman resigned from MIT to begin developing GNU -- a software project intended to create an operating system much like Unix. (GNU is pronounced "guh-NEW" and is a recursive acronym for GNU's Not Unix.) His desire was to create "free" software, but the term "free" should be equated with freedom, and as such people who use "free" software should be: 1) free to run the software for any purpose, 2) free to modify the software to suit their needs, 3) free to redistribute of the software gratis or for a fee, and 4) free to distribute modified versions of the software.

In other words, in the context of GNU software, the term "free" should be equated with the Latin word "liberat," meaning to liberate, and not necessarily "gratis", meaning without return made or expected. In the words of Stallman, we should "think of 'free' as in 'free speech,' not as in 'free beer.'"

At the height of the dot-com boom the phrase "open source software" was coined. As the story goes, a number of people from RedHat, a company that sells Linux distributions as well as support, were sitting around one day trying to figure out how to market their products and services. The idea of "free" software was, and still is, a difficult idea for many people to understand. Consequently they were trying to come up with a new phrase that conveyed the idea of free software without using the word free. Since it is the source code is what is given a way and people were encouraged to read and modify the source code, the phrase "open source software" selected. It stuck.

The process of creating and maintaining open source software revolves around a communication process akin to the scholarly communications process of academia. The process begins with a programmer's "itch", a problem the programmer wants to solve. To continue with the metaphor, the programmer then "scratches their itch" by writing a computer program. They are proud of their creation, and they share it with sets of their peers. Through this initial communication process, others become aware of a possible solution to their own problems. Soon a community develops as many people begin to use the software. Someone, whose problems are similar but different from the initial problem, decides to enhance the original program. This enhancement is given back to the original programmer. If the enhancement is not detrimental to the original concept of the program, then the enhancement is often incorporated into the program, the new piece of software is redistributed, and the process begins anew.

The process is similar to the scholarly communications process because open source software goes through a sort of peer review process. Fellow programmers examine a program's source code, find flaws, and suggest improvements. Eric Raymond, author of a book called The Cathedral and the Bazaar, describes this process in detail and posits that "given enough eyeballs, all bugs are shallow."

Costs of open source software

People often advocate open source software because it is free. While you will not pay for the source code directly, open source software is only "as free as a free kitten".

Suppose you were offered a free kitten. It is soft. It purrs. It plays with a ball of string. Cute and adorable, you take it home. First you buy a collar. Then you buy food and a food bowl. Next you take it to the veterinarian and they charge you a fee for shots. Alas, the kitten starts to cost you money. Moreover, the kitten escapes outdoors. It is lost overnight and you worry yourself sick. Not only have you invested time and money into your "free" kitten, but you have also invested emotional energy. Free kittens do not come without costs.

The same is true of commercial as well as open source. Both types of software cost time and money to install, configure, and implement. Training costs may be involved in learning how to use either type of software. Technical support may be included in the up front costs of the commercial software. Similarly, it is quite possible to purchase support from open source software vendors or third parties. Once the software is up an running, an institution will spend emotional energy and become attached to particular features of their implementations. These are all real costs.

The differences in costs between commercial software and open source software is two-fold. First, open source software does not include the up-front costs of commercial software. With open source software you get to "try before you buy", and you get to do this with a full-blown version of the product. No time-limited trials. No lack of documentation. No crippled features. You have the opportunity to see exactly what you are getting.

The second cost includes support costs. Commercial software will offer support, maybe for an extra fee. Maybe not. Most open source software does not come with formal support; there is rarely someone you can call on the telephone. Sometimes you can buy support from the vendor or third parties. If you want support you are expected to ask for help through online forums such as mailing lists or discussion forums. Since the original developers have personal stakes in the success of their application, it is quite likely they will be participating in the discussions and provide advice. If the particular piece of open source software is popular, then it is likely others will provide support. You, as an individual or institution, are expected to implement your own changes, customizations, or enhancements.

The time spent implementing the changes, customizations, and enhancements are real costs in both commercial software and open source software, but I contend that such costs are more akin to investments in personnel when it comes to open source software. For the most part, open source software is very standards compliant. There are few proprietary "enhancements" to standard file formats and protocols. Consequently, implementing changes in open source software is time spent learning skills that are transferable from computer program to computer program. The software skills applied against commercial software are more likely to be application-specific. Since open source software is more likely to be standards compliant, it is usually an easy task to export your data from one system and import it into another without having to remove or overcome the proprietary aspects of the data. The personnel costs associated with open source software are really investments in the institution. Institutions that make this investment will be effective at managing the change and risk of their computing environments.

Open source software and the integrated library system

Libraries are a lot about the collection, storage, organization, dissemination, and sometimes evaluation of information and knowledge. With the advent of computer technology in libraries many of these processes have been implemented through a library's "integrated library system" or ILS. The primary purpose of the ILS seems to be the management of lists of MARC records and the facilitation of services against these lists. The online public access catalog (OPAC) provides searching functions against the list. Cataloging provides functions to add and edit items on the list. Acquisitions provides some accounting functions. Reserve room modules, circulation modules, and interlibrary loan modules allow the locations of items to be temporarily moved from one place to another. Serials modules provide functions for inventory control.

Problems appear when the ILS does not keep up with changing expectations or does not function the way a library desires. Suppose a librarian wants to create a statistical report against the library's holdings. They want to know how much money was spent on books classified as science materials. If the ILS does not support this sort of function then the only recourse is to ask the vendor for an upgrade. If your ILS is implemented as a relational database, then a competent database administrator should be able to read the database's entity relationship diagram and extract the necessary information. But this is only possible of the vendor supplies you information about their database. Alternatively, suppose you wanted to provide a "virtual new bookshelf" allowing people to browse new acquisitions on a regular basis. If this is not an explicit function of your ILS, then you must find a work around, and it is likely to be specific to your particular ILS. Patrons' expectations are changing too. The Google Did You Mean? service is very popular. The Amazon.com People Like You Also Read service seems to be popular too. What can we do to incorporate these things into our systems if they are not initially a part of its makeup?

The problems are compounded when we realize that much of the information our patrons desire is not accessible in our ILS at all but through full-text and citation indexes. The catalog has traditionally been defined as an inventory list of the things a library physically owns or, now-a-days, licenses. These licenced materials are often times full-text or citation indexes to journal literature. People want to find an article on a particular topic, say, global warming. As librarians we must train patrons not to look in the catalog for such things but in a journal article index. Patrons find this increasingly difficult to understand since their experiences are driven by Google. One box. One button. Lot's of stuff. "Library why can't you do that?" Federated search engines are a hot topic these days. People are expecting simple search interfaces and one-stop shopping. Since much of this content is not owned by libraries but only licensed, libraries have a difficult time creating truly seamless access to the variety of licensed content as well as the content from our catalogs.

Short-term solutions to the problems are really "hacks" compounding the problem. These solutions include the use of traditional Z39.50 connections between computers. This solution is not great because Z39.50 is not universal and inconsistently implemented. Other solutions include "screen scraping" techniques were HTML pages are received by a program and the necessary information is extracted. This technique breaks as soon as the remote service changes its interface.

If libraries, as institutions, are willing to take more responsibility for their computing environments, then open source software techniques can play a role in filling these gaps. For example, there exist a number of open source tools allowing people to create and edit MARC records. The Perl module named MARC::Record is the most popular. Given this tool and an indexer/search engine (I currently endorse swish-e), the process of creating a virtual new bookshelf is almost trivial:

  1. Insert dates into bibliographic records denoting items' availability to the public.
  2. On a regular basis, extract all the MARC records from your ILS that are less than or equal to a particular date. This is your set of "new" items.
  3. Use something like MARC::Record to extract the data/information from the records people desire to know (title, author, subject, notes, etc.) and save the extracted information as sets of HTML files.
  4. Provide a browsable interface to the sets of HTML files.
  5. Index the HTML files.
  6. Provide an interface to search the index.
  7. Go to Step #1.

What's really nice about this algorithm is its vendor independency. As long as you can extract sets of MARC records from your ILS, then you can provide this sort of functionality. Even if MARC::Record or swish-e go away, there will be other tools available providing similar functionality.

The problem with this solution is the fact that most libraries, as institutions, do not have the necessary computing expertise to make the solution a reality. There does not seem to be a critical mass of people working in libraries who know how to write computer programs. Consequently library processes and computing environments are often held hostage by library-specific software vendors.

Open source software and open access journals

I assert that the same cultural factors and economic pressures that are making open source software a viable option are the same factors and pressures that are making open access journal literature more appealing.

As you know, the prices of scholarly journals have been increasing at rates much higher than inflation. If my salary had increased by the same rate as scholarly journals I would have doubled my salary more than five years ago and doubled it again since then. The problem is compounded by at least two other things. First, there is a shrinking number of publishers since the smaller publishers are getting bought by the bigger ones. Second, it is considered in very bad taste to publish the same article in more than one journal. The shrinking number of publishers combined with the veritable monopoly writers grant publishers makes for higher prices.

This environment was considered okay as long as the prices did not get out of control, but the prices are now out of control. Because of these high prices fewer libraries (and therefore people) have access to this literature. Many of the libraries that still subscribe to these journals are doing so through electronic-only means. No print issues are delivered, and libraries are really licencing permission to access the information, not download, archive, or keep it. When subscriptions lapse, access is gone.

For-profit publishers who license their content and do not make it available for wholesale downloading and archiving are similar to commercial software vendors who do not open up their source code. When scholarly materials are widely distributed and archived, then the historical record is preserved. In the model we see becoming more prevalent, scholarly materials are housed in at the central location of a publisher in an unknown format. How is a person to know that this information will not be changed or inadvertently deleted? If access is restricted, how are we to "fix bugs" in the literature? What happens to the content if for some reason the publishers go out of business?

Open access journal literature might have other problems, but it doesn't have these. Open access journal literature is freely disseminated. It is mirrored and archived around the world. The authenticity of any open access journal article can easily be verified by comparing it to versions from other archives. People require unfettered access to information (read software) in order to build on the good work of others. While nobody wants to deny the ability of people to make a living, this living should not come at the cost of making it more difficult to improve the human condition. For the most part, selling physical things like paper journals or automobiles is considered good business. On the other hand, selling information, while it is never free, does not seem to go over well in democratic societies. Open access journal literature, just like open source software, should make it easier to improve the human condition. After all, aren't both things intended to expand our knowledge and improve our lives?

Short reviews of selected pieces of open source software

There are many pieces of open source software directly relevant to the on going work in libraries. The first few listed here are general purpose application. The following set are library-specific.

The following is a list of more library-specific open source software distributions.

Open source software and libraries

The principles and practices of open source software are very similar to the principles and practices of modern librarianship. Both value free and equal access to data, information, and knowledge. Both value the peer review process. Both advocate open standards. Both strive to promote human understand and to make our lives better. Both make efforts to improve society as a whole assuming the sum is greater than the parts.

The use of open source software in libraries enables libraries to have greater control over their computing environments. Nobody is saying that all librarians should know how to compile relational database programs or debug Perl programs. On the other hand, it behooves libraries, as institutions, to know how to do this. If librarians want to be leaders in the fields of information and knowledge, then libraries need to know how to exploit the current technology that makes this happen. Open source principles, practices, and results can assist librarians in their fulfillment of day-to-day tasks as well as the goals of the profession.

Open source software represents a way for librarians to retain control over their computing environments instead of having their computer environments control them.

About the author

Eric Lease Morgan <emorgan@nd.edu> is a librarian at the University of Notre Dame, Notre Dame, Indiana, United States. His primary job is to help the Libraries implement and facilitate digital library services and digital library collections. He considers himself to be a librarian first and a computer user second. His professional goal is to discover new ways to improve library service through the use of computer technology. His was the original developer of MyLibrary but has been giving his software away for more than twenty years. He is also the maintainer of the Alex Catalogue of Electronic Texts at infomotions.com. In his spare time he can be seen folding defective floppy disks into intricate flaura and fauna.


Creator: Eric Lease Morgan <eric_morgan@infomotions.com>
Source: This is the pre-edited, English language version of the French article entitled "Logiciels libres et bibliotheques", BiblioAcid 1(2-3), May-June 2004, pgs. 1-8.
Date created: 2004-05-04
Date updated: 2004-12-12
Subject(s): articles; open source software; librarianship;
URL: http://infomotions.com/musings/biblioacid/