All things open

Things "open" abound. Open source software. Open access publishing. The open archives initiative. OpenURL. Some of these things are fundamental to the inner workings of the Internet. Others are a natural consequence of it. Some groups of people "believe" in things open with an almost religious fervor. At the other end of the spectrum are some people who see the same things as a drain on intellectual property. The key to progress lies in a middle ground. This presentation describes "all things open" in greater detail, elaborates on how they affect librarianship, and finally demonstrates some of their applicability in librarianship.

(This presentation is also available in a one-page hand-out version as well a a Powerpoint presentation.)

Open source software

What open source is and what role it can play in libraries? There are a number of such things. Open source software is about: 1) community, 2) things "free as a free kitten", 3) an investment in personnel, 4) taking responsibility for your computing environment, and 5) greater opportunities for innovation.

Open source software is about community and it works because of the 'Net.

Open source software is about sharing one's expertise with others. It is about solving computing problems in an environment where others have the same problems. By working together the community solves common problems and grows through the process. Open source software works because the Internet facilitates communication across a large number of people. It flattens institutional hierarchies and enables diverse interests to coalesce into larger communities.

The open source software community works similar to the peer-review process in academia. Works are put forward, people examine the works and make suggestions for improvement, the works are edited, and the process begins anew. In the open source software world, it is said that, "Given enough eye-balls, all bugs are shallow."

Some people are leery about open source software because of the apparent lack of technical support. "Who are you going to call?" How satisfied are you now with the technical support you get from your commercial software vendors? How much do you annually pay for this support? Do you feel like you are getting your money's worth? Open source software is supported through mailing lists and some commercial agreements. If you demonstrate that you have tried to resolve your issues after reading the documentation and if you are still having problems, then mailing lists do work. Well-worded questions get responses.

Open source software is as free as a free kitten.

You see a kitten and a sign next to it that says free. The kitten purrs. It plays with a ball of yarn. It is cute and fuzzy. You take it home. You then buy cat food. You take it to the veterinarian for shots. It claws your furniture. It then escapes into the night and returns the next day. While you didn't pay money for the kitten you did incur costs both financial as well as emotional. Open source software works the same way. While you don't pay money for it up front you do pay in terms of supporting hardware, emotional time and energy, and personnel. This is true for commercial software too, but with commercial software you have additional costs, the initial cost and the ongoing costs of licensing.

Investment in open source software is an investment in personnel because the learned skills are transferable.

More often than not open source software is standards-compliant; there are few "special features" in open source software that try to lock you into a particular product. There are few proprietary file formats in open source software. Files are usually saved in plain text (human-readable) formats, and binary file formats are well-documented. All of this means there are no "black boxes" in open source software and users of open source software need to learn a basic set of computing techniques: reading and writing plain text files, maintaining content in databases and writing reports against them, making content searchable with indexers, and transforming content into human-readable forms.

By combining standard file formats (such as MARC, XML), established computing technologies (such as relational databases and indexers), with open protocols (such as OAI, Z39.50 or SRW/U) it is more than possible to create modular digital library collections and services. It is then a library's responsibility to learn these computing tasks and mix-and-match them to meet their particular needs. (Brenda Chawner brought to my attention then need for "open file formats" allowing people to exchange information without the need to have a specialized computer program. This goes hand-in-hand with the idea of MARC and XML, but it needs to stated in terms of word processors and/or presentation software as well.)

Put another way, because everything is "open" and there are no "black boxes", open source software enables you to control your computing environment, not the other way around.

Open source software requires a greater degree of computing responsibility than commercial software.

The use of open source software requires a greater degree of computer sophistication. Libraries, as a whole, will need to know the fundamentals of relational database design. They need to understand the subtle differences between search with a database and search with an indexer. Libraries need to know how to read, write, and transform raw MARC records and XML files. Good reference librarians should know a bit about MARC records. Good catalogers are aware of the breadth and depth of the local collection. Good bibliographers understand the needs of the library patrons. None of these library specialties are islands unto themselves; each needs to know a bit about the work of the others. Similarly, libraries need to have a greater degree of computing knowledge, but no one is expecting every librarian to become a computer programmer.

Open source software makes it easier for libraries to innovate.

People's expectations regarding data and information have dramatically changed with the advent of globally networked computers. Libraries never were the center of the information universe, and they are less so even now. People don't come to libraries to do their learning and scholarship as much as they did twenty years ago. People expect most of their information needs to be accessible through email messages, RSS feeds, and Web browsers.

At the same time there is still a need for the fundamental services of libraries, namely the collection, organization, preservation, and dissemination of data and information. Through the use of open source software libraries can mix and match computer technologies to provide these services. Libraries, as a community, can do this in lieu of a vendor who is only going to create products and services after they have become well-established and well-articulated needs. The vendor community is not geared nor designed to keep up with the changes in user expectations, only librarian expectations.

Open source software also allows libraries to go beyond collection, organization, preservation, and access. With the advent of the Internet individuals can do much of this without libraries. This is a good thing because it enables librarians to evolve the definition of librarianship to include the use data and information, not just its access.

Open Access Publishing

Open access publishing is about freely accessible content, usually scholarly in nature.

The Open Access Movement, if you could call it a "movement", can be traced back to an email message called a Subversive Proposal written by Steven Harnad in 1994. The Proposal outlined a method for academia to make accessible scholarly content by depositing and distributing it through institutional repositories as well as through traditional publishing venues. Such a process, if adopted by the majority of academia, would not only facilitate scholarly communication and exploit the use of the Internet as a publication/distribution medium, but it would also circumvent some of the issues surrounding the escalating prices of the scholarly journal literature.

While the phrase "open access" was not used to describe them, a number of things were implemented around and after the Subversive Proposal that can be called open access publishing. Examples include the arXiv.org pre-print server, a number of electronic journals (PACS Review, Postmodern Culture, and Bryn Mawr Classical Review), Virginia Tech's Electronic Theses and Dissertations project and the Scholarly Publishing and Academic Resources Coalition. Since then the World Wide Web happened, libraries got hooked on the Big Deal, and scholarly literature was increasingly licensed by publishers and made accessible through Web browsers.

Fast forward to 2002 and the drafting of the Budapest Open Access Initiative. The Initiative endorsed self-archiving coupled with OAI (described below) as a means of distributing scholarly content. It also advocated "open access" publishing, a type of journal publication where authors retain copyrights to their work and no fees are paid to access the content. The Initiative was one of the first and most important statements endorsing the fundamental concepts of the Harnad's Proposal. Around this same time there were a number of other things in development: the creation of DSpace by MIT and Hewlett-Packard, an archiving/preservation tool called LOCKSS (Lot's Of Copies Keep Stuff Safe), eprints.org, and OAI was becoming well-established.

Fast forward to the present day and a growing number of academic institutions have drafted statements encouraging their faculty to publish in and practice open access publishing venues. Peter Suber has become a leading advocate for open access through his SPARC Open Access Newsletter. A number of colors have been associated with publication copyright policies. ("Green" publishers give authors the "green light" to self-archive.) The Directory of Open Access Journals lists more than 3,000 open access titles, and a number of institutional repository directories listing hundreds of entries are on the 'Net.

Creating and maintaining an institutional repository has become a fashionable project for universities, colleges, and libraries. While open access is not turning scholarly communication on its ear, it is slowly demonstrating new ways of expanding the sphere of knowledge. Because authors tend to be skeptical regarding changes in traditional scholarly communication, because they are for the most part ignorant of copyright issues, and because they are keenly interested in promotion and tenure procedures, institutions creating institutional repositories need to do more then just install DSpace or the eprints software. Just because you build it, does not mean they will come.

Open access publishing is another opportunity for libraries to direct their own future. Open access publishing represents an opportunity to more actively participate in the scholarly communication process. There are opportunities for collecting, organizing, preserving, and disseminating content. Successful projects are project where libraries are proactive in the process and provide value-added services to the content. Examples include the creation of author and departmental citation lists, the syndication of content (via email messages, RSS feeds, the campus-wide portal, etc.), the creation of What's New? services, the demonstration of impact by listing Google PageRank integers or links from remote sites to local content, or the creation of reports such as "These people looked at my article this past month".

To some degree, a library's participation in open access publishing is akin to the collection of the "gray literature". Remember the gray literature? What is old is new again.

Open Archives Initiative

The Open Archives Initiative-Protocol for Metadata Harvesting (OAI-PMH) is a standardized and well-accepted method for sharing and harvesting metadata.

The Initiative began in 2000/2001 and was spear-headed by Herbert Van de Sompel and Carl Lagoze. It grew out of the desire by the then fledgling open access publishing movement. More specifically, the ePrints people wanted ways to create services against their collection of publications. By sharing their metadata they believed those services could be created. Example services included things such as search, browse, What's New?, enhanced distribution, and amalgamation.

With these goals in mind a protocol -- an agreed upon communication method -- was designed. The protocol would work over the Web (HTTP) and therefore exploit the client-server computing model. One computer (the metadata harvester) would send another computer (the metadata repository) commands, usually in the form of a URL. The second computer would then respond with an agreed upon XML stream. In OAI there are only six commands, called "verbs", a harvester can send:

  1. Identify - Who are you?
  2. ListMetadataFormats - What controlled vocabularies do you support?
  3. ListSets - How have you organized your data?
  4. ListIdentifiers - Return the unique keys of your records
  5. ListRecords - Return a set of records
  6. GetRecord - Return a single record

Using these commands is is possible to collect the metadata from one or more repositories, copy it to one or more central caches, and provide enhanced services against them. Since OAI data repositories (the servers) are required to support the Dublin Core metadata format, it is very easy for OAI harvesters to be created and caches to be generated. OAI can support any number of metadata formats, but few repositories support more than Dublin Core, for better or for worse.

OAI has been successful because it is simple. There are only six commands, and Dublin Core must be supported. It was created by the library community with a lot of give and take. Finally, it has been successful because it is open, extensible, and designed to do one thing and one thing well.

OpenURL

OpenURL is an ANSI/NISO standard (Z39.88) used to describe information resources in an unambiguous way in order to facilitate information services against those resources. In the form of link resolvers, OpenURL was and still is primarily used to address the "appropriate copy problem" created by our globally networked computer environment. Not ironically, Herbert Van de Sompel played a significant role in the early development of OpenURL while working closely with Oren Beit-Arie, an employee of Ex Libris, the vendor of a very popular link resolver called SFX.

As content has become digital in nature it gets copied. Each copy has characteristics that go beyond the original version. For example, one copy may be in the form of an HTML file and other copy may be a PDF file. "Which one is the 'best' or 'correct' copy?" One version may be accessible from a traditional library as a printed something, and other versions may accessible through one or more serial aggregators. In a world where content is licensed, it is quite possible that one person at one institution may have access to the content and other at another institution cannot. To make matters even more complicated, even though many people may have access to the content a library may not provide the same level of service to all people. Faculty members or graduate students may be granted interlibrary loan privileges, but not undergraduates. By first encoding the metadata of information resources in an agreed upon format and then sending the result to a "resolver" libraries can present users with options -- services -- regarding the information resource. Most of the time these services include the retrieval of the full text of the resource, a search of the local or regional library catalogs, or the initiating of an interlibrary loan request.

Over the past decade a number of possible solutions to this, the "appropriate copy problem", have been articulated. Almost every solution revolves around the creation of a URI (Unique Resource Identifier) and sending it to a system which will look-up the URI and present a set of results in the form of services. For example, remember OCLC's PURL system? Given a unique URL the PURL server redirects a user's Web browser to a Web page. Remember Jake? Create lists of serials and associate them with aggregators. Then create a look-up service to identify which aggregator to use in order to search for specific journals or articles. Remember SerialsSolutions? Same thing as Jake accept you get the list of serials from a vendor. Know DOI (Digital Object Identifier)? Associate a journal article with a unique number, send the number to the DOI home planet in order to be redirected to the official version of the article. Use DSpace? In DSpace digital objects are associated with "handles" and resolved through services at a domain called handle.net.

At the expense of increased complexity, OpenURL can provide the same functionality as the other "appropriate copy problem" solutions outlined above, but it does so in a more robust/non-proprietary fashion, and it provides for greater functionality. OpenURL is usually implemented in the form of HTTP GET requests (URL's), but they can be manifested as HTTP POST requests or XML streams sent as SOAP messages. The URL's are made up of three parts: 1) the scheme (almost always http or https), 2) the resolver (the host plus the full HTTP path to the resolver application), and 3) the ContextObject (the information resource in question). The ContextObject is the most complicated part of the URL and it is a set of key/encoded-value (KEV) pairs. These KEV's must denote at least one "Referent" (the information resource in question), but they can also denote things such as: the "Requester" (the name of the user requesting services), the "ReferringEntity" (like an article with a footnote pointing to the Referent), the "ServiceType" (get full text, find in a catalog, initiate an ILL request, etc.), the "Resolver" (just like the second part of the URL), and the "Referrer" (the resource that generated the ContextObject).

Here is an outline of an OpenURL broken up into its various parts and elaborated upon:

  1. http://www.example.com/resolver? - the scheme and full path to the resolver
  2. url_ver=Z39.88-2004 - the version number of this OpenURL
  3. &rft_val_fmt=info:ofi/fmt:kev:mtx:journal - a "name space" where the following items are defined
  4. &rft.genre=article - what type of information object is this thing
  5. &rft.atitle=Librarianship: Profession of Opportunity - what is the thing's title
  6. &rft.jtitle=Future Librarianship - what is the journal title where this thing is located
  7. &rft.aulast=Dewey - what is the author's last name
  8. &rft.aufirst=Melville - what is the author's first name
  9. &rft.date=1999 - what is the date of the thing
  10. &rft.volume=3 - in what volume is this thing found
  11. &rft.issue=1 - in what issue is this thing found
  12. &rft.spage=8 - what is the starting page number of this thing
  13. &rft.epage=13 - what is ending page number of this thing

In reality, the URL would look like this when it is a part of an OpenURL system:

  http://www.example.com/resolver?url_ver=Z39.88-2004
  &rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article
  &rft.atitle=Librarianship:%20Profession%20of%20Opportunity
  &rft.date=1999&rft.volume=3&rft.issue=1
  &rft.spage=8&rft.epage=13

(Line breaks have been added for readability.)

Think of OpenURL as a system for encoding citations and providing services against them. These citations take a single form (unlike the various styles of traditional citations: MLA, Chicago, APA, etc.) and can be augmented with things like the name of the requester. These citations are then sent to an application providing the means to use the citation in various ways. These services are only limited by the imagination of implementors and might include: retrieve the full text of the citation, discover where this thing was previously cited, determine whether or not a library owns the thing, find more things like this one, return the email address of the author of this thing, turn this OpenURL into an MLA citation, etc.

Opportunities ("open-unities")

All things open (open source software, open access publishing, open archives initiative, and OpenURL) and librarianship boils down to another word beginning with the letter "o" and that word is opportunities or maybe "open-unities".

Our society is moving (or has moved) from production-based society to service-based society. Increasingly the economic drivers are not the creation of steal, cars, and refrigerators. Instead it is based on services such as banking, entertainment, and computing. The output of these services are data, information, and knowledge. Couple this with almost ubiquitous networked communication and you can see how the "knowledge worker" has come into her own.

In the 1980's it was cool to be a librarian/knowledge worker. You had access to a computer and computer systems like DIALOG. You were the conduit for indexed bibliographic information. People paid a lot of money for your services. Our profession was almost required because few others had access to the necessary information so conveniently. In the 1990's this became less true because that same bibliographic information was now being distributed on CD's. For a while those CD's were only located in the library, but eventually they became networked and accessible from faculty member's offices. For the most part, people still had to come to the library to actually get the cited articles. Then came the World Wide Web and now a proverbial flood of data and information suddenly became available. With this increased accessibility also came a changing set of user expectations. People expect to get the information they need to do the their work, their learning, their teaching, and their scholarship through their email programs and Web browsers. They expect to find it now with queries one or two words long, not through controlled vocabulary and fielded searches complete with Boolean operators. They expect access to full text. They expect it to be free and immediately accessible. These expectations are a far cry from the ways libraries have traditionally operated.

This does not spell the demise of librarianship. Instead it is the fuel for a metamorphosis. Through all things open librarians can transform themselves and the things they do to go beyond traditional library collections and services. Through all things open it is possible to enhance the meaning of librarianship, to empower users to a greater degree, and to more proactively support life long learning and the pursuit of truth through scholarship. Change is not easy. Think of the time and energy a caterpillar goes through to become a butterfly. Think of your own personal development from childhood through adolescence to adulthood. Librarianship may be going through the same thing, and it will continue to grow as long as the majority of us learn to adapt to the environment, build on what we have already created, and take advantage of our "open-utities".


Creator: Eric Lease Morgan <eric_morgan@infomotions.com>
Source: This file was never officially published, but the beginning is heavily based on another essay called Open Source Software in Thirty Minutes.
Date created: 2006-03-28
Date updated: 2006-03-28
Subject(s): OpenURL; OAI (Open Archives Initiative); presentations; open access publishing; open source software; librarianship;
URL: http://infomotions.com/musings/all-things-open/