Open Source Software in libraries

This guide is an introduction to open source software in libraries, with descriptions of a variety of software packages and successful library projects. But before we get to the software itself, I want to describe the principles and techniques of open source software (OSS) and explain why I advocate the adoption of OSS in the implementation of library services and collections.

As you will see, there are many shared principles between OSS and librarianship, especially the free and equal access to information. Because of the freedom we gain with the use of OSS is it possible to have greater control over the ways computers function and therefore greater control over how libraries operate. Anybody who works with computers on a daily basis can contribute to OSS because things like information architecture, usability testing, documentation, and staffing are key skills required for successful projects, and these skills are inherent in the people who use computers as a primary tool in their work. The implementation of OSS in libraries represents a method for improving library services and collections.

What is OSS

OSS is both a philosophy and a process. As a philosophy it describes the intended use of software and methods for its distribution. Depending on your perspective, the concept of OSS is a relatively new idea being only four or five years old. On the other hand, the GNU Software Project -- a project advocating the distribution of "free" software -- has been operational since the mid '80's. Consequently, the ideas behind OSS have been around longer than you may think. It begins when a man named Richard Stallman worked for MIT in an environment where software was shared. In the mid '80's Stallman resigned from MIT to begin developing the GNU -- a software project intended to create an operating system much like Unix. (GNU is pronounced "guh-NEW" and is a recursive acronym for GNU's Not Unix.) His desire was to create "free" software, but the term "free" should be equated with freedom, and as such people who use "free" software should be:

  1. free to run the software for any purpose
  2. free to modify the software to suit their needs
  3. free to redistribute of the software gratis or for a fee
  4. free to distribute modified versions of the software

Put another way the term "free" should be equated with the Latin word "liberat" meaning to liberate, and not necessarily "gratis" meaning without return made or expected. In the words of Stallman, we should "think of 'free' as in 'free speech,' not as in 'free beer.'"[1]

Fast forward to the late '90's after Linus Torvalds successfully develops Linux, a "free" operating system on par with any commercial Unix distribution. Fast forward to the late '90's when globally networked computers are an every day reality and the .com boom is booming. There you will find the birth of the term "open source" and it is used to describe how software is licensed:

Techniques for developing and implementing OSS

OSS is also a process for the creation and maintenance of software. This is not a formalized process, but rather a process of convention with common characteristics between software projects. First and for most, the developer of a software project almost always is trying to solve a specific computer problem commonly called "scratching an itch." The developer realizes other people may have the same problem(s), and consequently the developer makes the project's source code available on the 'Net in the hopes other people can use it too.

If there seems to be a common need for the software, a mailing list is usually created to facilitate communication, and the list is hopefully archived for future reference. Since the software is almost always in a state of flux, developers need some sort of version control software to help manage the project's components. The most common version control software is called CVS (Concurrent Versions System). Co-developers then "hack away" at the project adding features they desire and/or fixing bugs of previous releases. As these features and fixes are created the source code's modifications, in the form of "diff" files -- specialized files explicitly listing the differences between two sets of programming code -- are sent back to the project's leader. The leader examines the diff files, assesses their value, and decides whether or not to integrate them into the master archive. The cycle then begins anew. Much of a project's success relies on the primary developer's ability to foster communication and a sense of community around a project. Once accomplished the "two heads are better then one" philosophy takes effect and the project matures.

Writing computer programs is only one part of the software development. Software development also requires things such as usability testing, documentation, beta-testing, and a knowledge of staff issues. Consequently, in any environment where computers are used on a daily basis are places where the techniques of OSS can be practiced. Knowledge of computer programming is not necessary. In fact, a lack of computer programming is desireable. You do not have to know how to write computer programs in order to participate in OSS development.

Anybody who uses computers on a daily basis can help develop OSS. For example, you can be a beta-tester who tries to use the software and finds its faults. You can write documentation instructing people how to use the software. You can conduct usability tests against the software discovering how easy the software is to use or not use, and how it meets people's expectations. If computer software is intended to make our lives easier, you can evaluate the use of the software and see what sorts of things can be eliminated or how resources can be reallocated in order to run operations more efficiently. All of these things have nothing to do with computer programming, but rather, the use of computers in a work place.

OSS Compared to Librarianship

One the most definitive sets of writings describing OSS is Eric Raymond's The Cathedral and the Bazaar.[3] These texts, available online as well as in book form, compare and contrast the software development processes of monolithic organizations (Cathedrals) with the software processes of less structured, more organic collections of "hackers" (Bazaars).[4] The book describes the environment of free software and tries to explain why some programers are willing to give away the products of their labors. It describes the "hacker milieu" as a "gift culture":

Gift cultures are adaptations not to scarcity but to abundance. They arise in populations that do not have significant material scarcity problems with survival goods. We can observe gift cultures in action among aboriginal cultures living in ecozones with mild climates and abundant food. We can also observe them in certain strata of our own society, especially in show business and among the very wealthy.[5]

Raymond alludes to the definition of "gift cultures", but not enough to satisfy my curiosity. The literature, more often than not, refers to information about "gift exchange" and "gift economies" as opposed to "gift cultures." Probably one of the earliest and more comprehensive studies of gift exchange was written by Marcell Mauss.[6] In his analysis he says gifts, with their three obligations of giving, receiving, and repaying, are in aspects of almost all societies. The process of gift giving strengthens cooperation, competitiveness, and antagonism. It reveals itself in religious, legal, moral, economic, aesthetic, morphological, and mythological aspects of life.[7]

As Gregory states, for the industrial capitalist economies, gifts are nothing but presents or things given, and "that is all that needs to be said on the matter." Ironically for economists, gifts have value and consequently have implications for commodity exchange.[8] He goes on to review studies about gift giving from an anthropological view, studies focusing on tribal communities of various American indians, cultures from New Guinea and Melanesia, and even ancient Roman, Hindu, and Germanic societies:

The key to understanding gift giving is apprehension of the fact that things in tribal economics are produced by non-alienated labor. This creates a special bond between a producer and his/her product, a bond that is broken in a capitalistic society based on alienated wage-labor.[9]

Ingold, in "Introduction To Social Life" echoes many of the things summarized by Gregory when he states that industrialization is concerned:

exclusively with the dynamics of commodity production. ... Clearly in non-industrial societies, where these conditions do not obtain, the significance of work will be very different. For one thing, people retain control over their own capacity to work and over other productive means, and their activities are carried on in the context of their relationships with kin and community. Indeed their work may have the strengthening or regeneration of these relationships as its principle objective.[10]

In short, the exchange of gifts forges relationships between partners and emphasizes qualitative as opposed to quantitative terms. The producer of the product (or service) takes a personal interest in production, and when the product is given away as a gift it is difficult to quantify the value of the item. Therefore the items exchanged are of a less tangible nature such as obligations, promises, respect, and interpersonal relationships.

As I read Raymond and others I continually saw similarities between librarianship and gift cultures, and therefore similarities between librarianship and OSS development. While the summaries outlined above do not necessarily mention the "abundance" alluded to by Raymond, the existence of abundance is more than mere speculation. Potlatch, a ceremonial feast of the American Indians of the northwest coast marked by the host's lavish distribution of gifts or sometimes destruction of property to demonstrate wealth and generosity with the expectation of eventual reciprocation, is an excellent example.

Libraries have an abundance of data and information. I won't go into whether or not they have an abundance of knowledge or wisdom of the ages. That is another essay. Libraries do not exchange this data and information for money; you don't have to have your credit card ready as you leave the door. Libraries don't accept checks. Instead the exchange is much less tangible. First of all, based on my experience, most librarians just take pride in their ability to collect, organize, and disseminate data and information in an effective manner. They are curious. They enjoy learning things for learning's things sake. It is a sort of Platonic end in itself. Librarians, generally speaking, just like what they do and they certainly aren't in it for the money. You won't get rich by becoming a librarian.

Even free information is not without financial costs. Information requires time and energy to create, collect, and share, but when an information exchange does take place, it is usually intangible, not monetary, in nature. Information is intangible. It is difficult to assign information a monetary value, especially in a digital environment where it can be duplicated effortlessly:

An exchange process is a process whereby two or more individuals (or groups) exchange goods or services for items of value. In Library Land, one of these individuals is almost always a librarian. The other individuals include tax payers, students, faculty, or in the case of special libraries, fellow employees. The items of value are information and information services exchanged for a perception of worth -- a rating valuing the services rendered. This perception of worth, a highly intangible and difficult thing to measure, is something the user of library services "pays", not to libraries and librarians, but to administrators and decision-makers. Ultimately, these payments manifest themselves as tax dollars or other administrative support. As the perception of worth decreases so do tax dollars and support. [11]

Therefore when information exchanges take place in libraries librarians hope their clientele will support the goals of the library to administrators when issues of funding arise. Librarians believe that "free" information ("think free speech, not free beer") will improve society. It will allow people to grow spiritually and intellectually. It will improve humankind's situation in the world. Libraries are only perceived as beneficial when they give away this data and information. That is their purpose, and they, generally speaking, do this without regards to fees or tangible exchanges.

In many ways I believe OSS development, as articulated by Raymond, is very similar to the principles of librarianship. First and foremost with the idea of sharing information. Both camps put a premium on open access. Both camps are gift cultures and gain reputation by the amount of "stuff" they give away. What people do with the information, whether it be source code or journal articles, is up to them. Both camps hope the shared information will be used to improve our place in the world. Just as Jefferson's informed public is a necessity for democracy, OSS is necessary for the improvement of computer applications.

Second, human interactions are a necessary part of the mixture in both librarianship and open source development. Open source development requires people skills by source code maintainers. It requires an understanding of the problem the computer application is trying to solve, and the maintainer must assimilate patches with the application. Similarly, librarians understand that information seeking behavior is a human process. While databases and many "digital libraries" house information, these collections are really "data stores" and are only manifested as information after the assignment of value are given to the data and inter-relations between datum are created.

Third, it has been stated that open source development will remove the necessity for programers. Yet Raymond posits that no such thing will happen. If anything, there will an increased need for programmers. Similarly, many librarians feared the advent of the Web because they believed their jobs would be in jeopardy. Ironically, librarianship is flowering under new rubrics such as information architects and knowledge managers.

OSS also works in a sort of peer review environment. As Raymond states, "Given enough eyeballs, all bugs are shallow." Since the source code to OSS is available for anybody to read, it is possible to examine exactly how the software works. When a program is written and a bug manifests itself, there are many people who can look at the program, see what it is doing, and offer suggestions or fixes.

Instead of relying on marketing hype to promote an application, OSS relies on its ability to satisfy particular itches to gain prominence. The better a piece of software works, the more people are likely to use it. User endorsements are usually the way OSS is promoted. The good pieces of software float to the top because they are used the most often. The ones that are poorly written or do not satisfy enough itches sink to the bottom.

In a peer review process many people look at an article and evaluate its validity. During this evaluation process the reviews point out deficiencies in the article and suggest improvements. The reviewers are usually anonymous but authoritative. The evaluation of OSS often works in the same vein. Software is evaluated by self-selected reviewers. These people examine all aspects of the application from the underlying data structures, to the way the data is manipulated, to the user interface and functionality, to the documentation. These people then offer suggestions and fixes to the application in an effort to enhance and improve it.

Some people may remember the "homegrown" integrated library systems developed in the '70's and '80's, and these same people may wonder how OSS is different from those humble beginnings. There are two distinct differences. The first is the present-day existence of the Internet. This global network of computers enables people to communicate over much greater distances and it is much less expensive than twenty-five years ago. Consequently, developers are not as isolated as they once were, and the flow of ideas travels more easily between developers -- people who are trying to scratch that itch. Yes, there were telephone lines and modems but the processes for using them was not as seemlessly integrated into the computing environment (and there were always long-distance communications charges to contend with.[12])

Second, the state of computer technology and its availability has dramatically increased in the past twenty-five years. Twenty-five years ago computers, especially the sorts of computers used for large-scale library operations, were almost always physically large, extremely expensive, remote devices whose access was limited to a group of few specialized individuals. Now-a-days, the computers on most people's desktops have enough RAM, CPU horsepower, and disk space to support the college campus of twenty-five years ago.[13]

In short, the OSS development process is not like the homegrown library systems of the past simply because there are more people with more computers who are able to examine and explore the possibilities of solving more computing problems. In the times of the homegrown systems people were more isolated in their development efforts and more limited in their choice of computing hardware and software resources.

Prominent OSS Packages

There are quite a number of mainstream OSS applications. Many of these applications literally run the Internet or are used for back-end support. The Apache Project is one of the more notable (www.apache.org). Apache is a World Wide Web (HTTP) server. It started out its life in the mid '90's as NCSA's httpd application, the Web server beneath the first graphical Web browser. The name for the application -- Apache -- is a play on words. It has nothing to do with indians. Instead, in an effort to write a more modular computer program, the original httpd application was rewritten as a set of parts, or patches, and consequently the application is called "a patchy server." Few experts would doubt the popularity of the Apache server. According to Netcraft, more HTTP servers are Apache HTTP server than any other kind. [14]

MySQL is a popular relational database application. It is very often used to support database-driven websites. It adhears to the SQL standard while adding a number of features of its own (as does Oracle and other database vendors). MySQL is known for its speed and stability. The cononical address for MySQL is www.mysql.org.

Sendmail is an email (SMTP) server used on the vast majority of Unix computers. This application, developed quite a number of years ago is responsible for trafficing much of the email messages sent throughout the world. Sendmail is a good example of an application supported by both a commercial institution as well as a non-profit organization. There is a free version of sendmail, complete with source code, as well as a commercial version that comes with formal support. See www.sendmail.org.

BIND is an acronym for the Berkeley Internet Name Domain, a program converting Internet Protocol (IP) numbers, such as 17.112.144.32 into human-readable names such as www.apple.com. It is a sort like an old fashioned switchboard operator associating telephone numbers with the telephones in people's homes. BIND is supported by the Internet Software Consortium at www.isc.org.

Perl is a programming language written by Larry Wall in the late '80's. It too runs much of the Internet since it is used as the language of many common gateway interface (CGI) scripts of the internet. Wall originally created Perl to help him do systems administration task, but the language worked so well others adopted it and it has grown significantly. Perl is supported at www.perl.com.

Linux is the most familiar OSS application. This program is really an operating system -- a program directly responsible converting human-readable commands into computer (machine) language. It is the software that really makes computers run. Linux was originally conceived by Linus Torvols in the late '80's because he wanted to run a Unix-sort of operating system on Intel-based computer. Linux is becoming increasingly popular with many information technology (IT) professionals as an alternative to Windows-based server applications or proprietary versions of Unix. See www.linux.org.

State of OSS in Libraries

Daniel Chudnov has been the library profession's OSS evangelist for the past three or four years. He is also the original author of the open source program jake (jointly administered knowledge environment). Chudnov has done a lot to raise the awareness of OSS in libraries. To that end he and others help maintain a website called OSS4Lib (www.oss4lib.org). The site lists library-related applications including applications for document delivery, Z39.50 clients and servers, systems to manage collections, MARC record readers and writers, integrated library system, and systems to read and write bibliographies. For more information visit OSS4Lib and subscribe to the mailing list.

The state of OSS in libraries is more than sets of computer programs. It also includes the environment where the software is intended to be used -- a socio-economic infrastructure. Any computing problem can be roughly divided into 20% technology issues and 80% people issues. It is this 80% of the problem that concerns us here. Given the current networked environment, the affinity of OSS development to librarianship, and the sorts of projects enumerated above what can the library profession do to best take advantage of the currently available OSS? I posed this question to the OSS4Lib mailing list in April and May of 2000 and it generated a lively discussion. [15] A number of themes presented themselves, each of which are elaborated upon below.

National leadership

One of the strongest themes was the need for a national leader. It was first articulated by David Dorman as the OSLN (Open Source Library Network). Karen Coyle and Aaron Trehab elaborated on this idea by suggesting organizations such as ALA/LITA, the DLF, OCLC, or RLG help fund and facilitate methods for providing credibility, publicity, stability, and coordination to library-based OSS projects.

Mainstreaming, workshops, and training

Along theses same lines was the expressed desire for the mainstreaming of OSS articulated by Carol Erkens, Rachel Cheng, and Peter Schlumpf. This mainstreaming process would include presentations, workshops, and training sessions on local, regional, and national levels. These activities would describe and demonstrate open sources software for libraries. They would enumerate the advantages and disadvantages of open sources software. They would provide extensive instructions on the staffing, installation, and maintenance issue of OSS.

Usability and packaging

In its present state, open sources software is much like microcomputer computing of the '70's as stated by Blake Carver. It is very much a build it yourself enterprise; the systems are not very usable when it comes to installation. This point was echoed by Cheng who recently helped facilitate a NERCOMP workshop on OSS. Peter Schulmpf points to the need for easier installation methods so maintainers of the system can focus on managing content and not software. Using OSS should not be like owning an automobile in the 1920's. "I shouldn't necessarily need to know how to fix it in order to make it go."

Economic viability

OSS needs to be demonstrated as an economically viable method of supporting software and systems. This was pointed out by Eric Schnell and David Dorman. Libraries have spent a lot of time, effort, and money on resource sharing. Why not pool these same resources together to create software satisfying our professional needs? OSS is not like the "homegrown" systems. Spaghetti code and GOTO statements should be a thing of the past. More importantly, a globally networked computer environment provides a means of sharing expertise in a manner not feasible twenty-five years ago. We need to demonstrate to administrators and funding sources that money spent developing software empowers our collective whole. It is an investment in personnel and infrastructure. OSS is not a fad, yet is will not necessarily replace commercial software. On the other hand, OSS offers opportunities not necessarily available from the commercial sector.

Redefining the ILS

There are many open source library application available today. Each satisfies a particular need. Maybe each of these individual applications can be brought together into a collective, synergistic whole as described by Jeremy Frumkin and we could redefine the integrated library system. Presently our ILS's manage things like books pretty well. With the addition of 856 fields in MARC records they are beginning to assist in the management of networked resources, but libraries are more than books and networked resources. Libraries are about services too: reserves, reading lists, bibliographies, reader advisory services of many types, current awareness, reference, etc. Maybe the existing OSS can be glued together to form something more holistic resulting in a sum greater than its parts. This is also an opportunity, as described by Schnell, for vendors to step in and provide such integration including installation, documentation, and training.

Open source data

OSS relates to data as well as systems as described by Krichel. The globally networked computer environment allows us to share data as well as software. Why not selectively feed URL's to Internet spiders to create our own, subject-specific indexes? Why not institutionalize services like the Open Directory Project or build on the strength of INFOMINE to share records in a manner similar to the manner of OCLC?

Conclusion and next steps

This essay has described what OSS is and it compared OSS to the principles of librarianship. The balance of the book details particular systems of OSS for libraries. After reading this book I hope you go away understanding at least one thing. OSS provides the means for the profession to take greater control over the ways computers are used in libraries. OSS is free, but it is free in the same way freedom exists in a democracy. With freedom comes choice. With freedom comes the ability to manifest change. At the same time, freedom comes at a price, and that price is responsibility. OSS puts its users in direct control of computer operations, and this control costs in terms of accountability. When the software breaks down, you will be responsible for fixing it. Fortunately, there is a large network at your disposal, the Internet, not to mention the creator of the software who has the same problems you do and has most likely previously addressed the same problem. Open source provides the means to say, "We are not limited by our licensed software because we have the ability to modify the software to meet our own ends." Instead of blaming vendors for supporting bad software, instead of confusing the issues with contractual agreements and spending tens of thousands of dollars a year for services poorly rendered, OSS offers an alternative. Be realistic. OSS is free, but not without costs.

This being the case, what sorts of things need to happen for OSS to become a more viable computing option in libraries? What are the next steps? The steps fall into two categories: 1) making people more aware of OSS and 2) improving the characteristics of OSS.

Librarians need to become more aware of the options OSS provides. This can be done in a number of ways. For example, a formal study analyzing the desirability and feasibility of libraries making a formal commitment to OSS might demonstrate to other libraries the benefits of OSS. Library boards and directors need feel comfortable commiting funds to OSS installation and development, but before doing so the boards and directors need to know what OSS is and how its principles can be applied in libraries. By mentoring existing librarians to become more computer literate the concepts of OSS will become easier to understand. Similarly, by mentoring librarians to be more aware of the ways of administration these same librarians will have more authority to make decisions and direct energies to OSS development. All librarians should not be afraid of the idea of open sources software because they think computer programming experience is necessary. There is much more to software development than writing computer programs. Simple training exercises will also make more people aware of the potential of open sources software. Finally, communication -- testimonials -- will help disseminate the successes, as well as failures, of OSS.

OSS itself needs to be improved. The installation processes of OSS are not as simple as the installation procedures of commercial software. This is area that needs improvement, and if done, fewer people would be intimidated by the installation process. Additionally, there are opportunities for commercial institutions to support OSS. These institutions, like Red Hat or O'Reilly & Associates, could provide services installing, documenting, and trouble shooting OSS. These institutions would not be selling the software itself, but services surrounding the software.

The principles of OSS of very similar to the principles of librarianship. Let's take advantage of these principles and use them to take more control of over our computing environments.

Notes

  1. The ideas behind GNU software and its definition as articulated by Richard Stallman can be found at http://www.gnu.org/philosophy/free-sw.html. Accessed April 25, 2002.
  2. Much of the preceeding section was derived from Dave Bretthaur's excellent article, "OSS: A History" in Information Technology and Libraries 21(1) March, 2002. pg. 3-10.
  3. The Cathedral and the Bazaar is also available online at http://www.tuxedo.org/~esr/writings/cathedral-bazaar/. Accessed April 25, 2002.
  4. It is important to distinguish here the difference between a "hacker" and a "cracker". As defined by Raymond, a hacker is person who writes computer programs because they are "scratching an itch" -- trying to solve a particular computer problem. This definition is contrasted with the term "cracker" denoting a person who maliciously tries to break computer systems. In Raymond's eyes, hacking is a noble art, cracking is immoral. It is unfortunate, the distinction between hacking and cracking seems to have been lost on the general population.
  5. Raymond, E.S., The cathedral and the bazaar: musings on Linux and open source by an accidental revolutionary. 1st ed. 1999, [Sebastopol, CA]: O'Reilly. pg. 99.
  6. Mauss, M., The gift; forms and functions of exchange in archaic societies. The Norton library, N378. 1967, New York: Norton.
  7. Lukes, S., Mauss, Marcel, in International encyclopedia of the social sciences, D.L. Sills, Editor. 1968, Macmillan: [New York] volume 10, pg. 80.
  8. Gregory, C.A, "Gifts" in Eatwell, J., et al., The New Palgrave : a dictionary of economics. 1987, New York: Stockton Press. volume 3, pg. 524.
  9. Ibid.
  10. Ingold, T., Introduction To Social Life, in Companion encyclopedia of anthropology, T. Ingold, Editor. 1994, Routledge: London ; New York. p. 747.
  11. 11. Morgan, E.L., "Marketing Future Libraries", http://www.infomotions.com/musings/marketing/. Accessed April 25, 2002.
  12. As an interesting aside, read "Stalking the wily hacker" by Clifford Stoll in the Communications of the ACM May 1988 31(5) pg. 484. The essay describes how Clifford tracked a hacker via a 75 cent error in his telephone bill. It is on the Web in many places. Try http://eserver.org/cyber/stoll2.txt. Accessed April 25, 2002
  13. It is believed a past chairman of IBM, Thomas Watson, said in 1943, "I think there is a world market for maybe five computers."
  14. See http://www.netcraft.com for more information. Accessed April 25, 2002.
  15. An archive of the oss4lib mailing list is available at this URL http://www.geocrawler.com/lists/3/SourceForge/6067/0/. Accessed April 25, 2002.

Creator: Eric Lease Morgan <eric_morgan@infomotions.com>
Source: This essay appeared in Open Source Software for Libraries, a LITA Guide, in 2002.
Date created: 2002-04-25
Date updated: 2005-05-08
Subject(s): articles; open source software;
URL: http://infomotions.com/musings/ossnlibraries-lita/