Description and evaluation of the Mr. Serials Process

Abstract

This article describes the Mr. Serials Process. The Mr. Serials Process is a systematic method being applied at the North Carolina State University (NCSU) Libraries for collecting, organizing, archiving, indexing, and disseminating electronic serials. Using readily-available technologies found on the Internet (FTP, WAIS, gopher, HTTP, perl, procmail, email), the Mr. Serials Process has proven an effective means for the management of electronic serials that are consistently formatted and delivered via email.

Introduction

In the Fall of 1991 the North Carolina State University (NCSU) Libraries participated in an Association of Research Libraries (ARL) self-study project called the Collection Analysis Project (CAP). Since the Collection Management Department in the Libraries was relatively young, there were few documented collection management policies. The purpose of CAP was to develop plans for creating such policies, as well as to develop a vision for the Department.

A part of the CAP was the creation of an interim report describing the state of the NCSU Libraries specifically and libraries in general. It was during the creation of the interim report I became acutely aware of the present "serials pricing crisis," the purpose of peer reviewed literature, and the scholarly communications process. Consequently I became very concerned when I learned about authors signing away their copyrights and publishers selling journals for greatly inflated prices. These concepts directly conflicted with my fundamental values as a person and librarian. "How can a person own an idea? How can anybody sell ideas? How can publishers justify selling a subscription for thousands of dollars per title? Why don't libraries 'just say no?'" Since electronic serials are manifestations of ideas that should be shared as freely as possible, then libraries have a responsibility to make collections of electronic serials available. The Mr. Serials Process, described in the balance of this article, represents one solution to the problems of collecting electronic serials and demonstrates how libraries can take a more active role in the creation of information systems.

This article describes and evaluates an informal research project called the Mr. Serials Process (hereafter referred to as "Mr. Serials"). This project has two primary goals. The first goal is to discover an automated method for collecting, organizing, archiving, indexing, and disseminating electronic serials. Since our administrators are asking us to do more with less, then automation is often times (but not always) the solution to these requests. Furthermore an automated process, if correctly implemented reduces the possibility of human error and increases productivity. To a great extent the first goal has been accomplished using the FTP, WAIS, HTTP, gopher, procmail, and a number of programing technologies. (See the glossary for definitions of terms used in this article.)

The second goal of the project is much broader in scope. The Mr. Serials Process hopes to demonstrate to the scholarly community that if they publish their findings in inexpensive or free, peer-reviewed Internet-based journals, then libraries can effectively help facilitate the scholarly communications process.

The peer review process is only one aspect of the scholarly communications process. Unless one or more institutions provide access to the peer-reviewed literature, then the scholarly communications process breaks down. In our present environment, the institutions providing access services are publishers and libraries. Publishers print the information and sell it to libraries. Libraries, in turn, make these publications available to other scholars. Other publishers index scholarly communications. Again, these publishers sell their indexes to libraries which make them available to their constituents.

As this article will demonstrate, libraries do not have to be dependent on traditional publishers for their materials. Libraries can provide these services themselves. Instead, if the scholarly community were to develop its own peer-review processes and freely distribute its findings, then libraries could retrieve papers directly from the scholars and make them available without incurring the fees and restrictions imposed by traditional print publishers. The second objective of Mr. Serials (to create large collections of electronic serials) requires a critical mass of freely available scholarly electronic publications.

Electronic serials are seen by many as a possible solution to many of these issues, and the Mr. Serials Process tries to facilitate their acceptance. [2, 3] Assuming a project could demonstrate an effective method for collecting, organizing, archiving, and disseminating electronic serials, then there might be an added incentive for researchers to publish electronically and become less dependent on high-profit publishers.

Preliminary Testing

In the summer of 1992 the WAIS (Wide Area Information Server) and gopher technologies were becoming popular. WAIS is an indexing/document delivery technology. By incorporating WAIS and FTP (File Transfer Protocol) technologies into gopher server software, the gopher protocol is also a tool that facilitates document delivery. At NCSU, we did experiments to test these technologies. The Libraries owned a couple of Unix-based DECstation 5000's. One of these Unix computers was aptly named "dewey" (dewey.lib.ncsu.edu). Dewey was to become the host for our WAIS and gopher servers. We downloaded the WAIS software from boombox.micro.umn.edu, and by following the instructions and requesting help from the Usenet newsgroup comp.infosystems.wais, we created our first WAIS index. Inspired by this success, we repeated the process with the gopher software and met with more success. Thus, the gopher at the NCSU Libraries was born. [4]

Based on these initial successes, formal collection management issues were addressed concerning a systematic process for creating a collection of electronic serials, specifically:

  1. Selection
  2. Storage
  3. Access
  4. Organization
  5. Bibliographic control
  6. Acquisitions

Selection

Initially, for the purposes of Mr. Serials, the selection process was easy. Because we decided to create a comprehensive collection of library and information science-related electronic serials, Charles Bailey's guide was used as a selection mechanism. [5] We chose the library/information science subject area in hopes of catching the interest of the library and information science communities. Later, as Mr. Serials proved effective, we added a few scholarly titles from other disciplines to the collection. Figure 1 lists the titles presently managed by the Mr. Serials Process.

Figure 1

This table lists the serials collected by the Mr. Serials Process as of July 5, 1995 where:

Title Files Size (MB) Index (MB) Total (MB) ELL Inception
ACQNET 432 3.72 5.87 9.60 x December 1990
ALANEWS 67 1.55 2.69 4.25 x January 1994
ALAWON 206 2.07 2.83 4.91 x July 1992
ALCTS Network News 205 2.58 4.32 6.91 x May 1991
Bryn Mawr Classical Review 786 9.30 11.85 21.16 [January?] 1991
Bryn Mawr Medieval Review 157 1.77 3.00 4.77 August 1993
Catalyst 64 1.09 1.33 2.43 Summer 1991
Citations for Serial Literature 68 0.50 1.17 1.68 x February 1992
Conserline 6 0.11 0.20 0.32 x January 1994
Current Cites 61 0.92 1.96 2.89 x August 1990
Electronic Journal of Virtual Culture 57 2.00 2.73 4.73 March 1993
Infobits 22 0.22 0.59 0.82 July 1993
Information Policy Online 6 0.13 0.30 0.43 March 1994
Information Technology and Disabilities 61 0.83 1.17 2.01 x January 1994
INFOSYS 72 1.91 2.90 4.82 x January 1994
Interpersonal Computing and Technology 53 1.58 2.16 3.74 January 1993
IRLIST Digest 244 4.91 8.63 13.55 x December 1989
Journal of Technology Education 105 2.23 2.41 4.65 Fall 1989
LC Cataloging Newsline 20 0.30 0.64 0.95 x January 1993
LIBRES 45 0.82 1.65 2.47 x April 1993
List Review Service 28 0.14 0.37 0.52 x November 1991
LITA Newsletter 110 0.81 0.90 1.72 x Fall 1993
MC Journal 15 0.32 0.88 1.21 x Spring 1993
NCSU Libraries Newsletter 201 0.42 0.95 1.37 x August 1991
Network News 15 0.21 0.66 0.88 x October 1992
Newsletter for Serial Pricing Issues 141 2.45 3.46 5.92 x May 1991
Postmodern Culture 264 9.44 9.76 19.2 September 1990
PSYCHE 9 0.15 0.35 0.50 December 1993
Psychology Graduate Student Journal 11 0.12 0.27 0.39 October 1993
Psycoloquy 47 0.71 1.36 2.07 October 1994
Public Access Computer Systems News 60 0.51 1.21 1.73 x March 1990
Public Access Computer Systems Review 120 2.45 3.00 5.46 x [March?] 1990
Science and Technology Librarianship 14 0.71 1.41 2.12 x December 1991
Surfaces 69 2.43 3.38 5.82 October 1991
Sylvanet 5 0.15 0.00 0.15 November 1993
Technician 350 1.25 2.41 3.12 October 1994
Electronic Library Literature n/a n/a n/a 27.6 Spring 1992
Total 4,197 61.00 116.60 177.60

All the titles being collected are serials, but not all are scholarly journals. Most of the serials collected by Mr. Serials are, in reality, newsletters or moderated discussion lists. ALAWON is a good example. It is issued serially, but it is not a peer reviewed journal; it is a newsletter listing and describing legislative events of interest to librarians.

After deciding what to collect, the next step was to subscribe to the serials. At first, this was done using the traditional method where we sent a subscription request (ex: subscribe PACS-P Eric Morgan) to the LISTSERV hosting the electronic serial. Later, a program called ListManager was employed instead. [6] By answering a number of questions, ListManager automatically creates and sends email messages designed for LISTSERVs. These messages include subscription requests, search requests, and email customization functions. Since ListManager maintains a simple database of electronic serial email addresses, the ListManager makes it easy to query the LISTSERV hosting the electronic serials and verify an electronic serial's continued existence. By using the database and index commands of LISTSERVs, ListManager can also be used as a simple tool for electronic serial claiming.

FTP For Storage and Rudimentary Access

Once electronic serials arrived we decided the resulting texts would be saved in an FTP archive. In keeping with the library tradition of providing access to as many end-users as possible, an FTP archive was chosen to store and archive the serials because it represents a technology available to any computer with access to the Internet.

Before an archive of electronic serials could be constructed, directory structures and file naming conventions had to be developed. Each title was given a single directory to simplify the indexing process later. Since each file within a directory must have a unique name, a naming convention for each file was developed using the following format: Code-vVolumenIssue-Author-Title where:

The serial codes are arbitrary; the values chosen are not especially important, as long as they are applied consistently. [7] When an article does not have an author or title, the author or title attributes were not designated for inclusion. When a serial was issued without volume or issue information, then the file name includes a date of publication which was used in place of the volume/number combination.

For example, ALAWON is issued with a volume/number sequence but it contains no author or title information. Thus, issues from ALAWON are saved with a file name such as alawon-v1n01. ALANEWS does not include authors, title, volumes, or numbers. Consequently, file names for ALANEWS look like alanews-940111. IRList Digest, which is issued numerically is saved with a file names like irld-0019. The Public Access Computer Systems Review contains authors, titles, volumes, and numbers. Thus, files from this serial have names such as pr-v5n05-barry-hypertext. By including leading zeros and using "backwards" dates, it is possible to sort the file names chronologically.

gopher or WWW for Enhanced Access and Organization

FTP is strong in regard storage, but weak on access. In other words, FTP file names are generally cryptic, even taking Mr. Serials' file naming conventions into consideration. An end-user would need to know the title's serial code or need to refer to an abbreviation table in order to effectively download particular issues or articles.

This is where a gopher or a hypertext transfer protocol (HTTP, commonly known as the World Wide Web and often abbreviated "WWW") server comes in. The gopher and HTTP protocols excel at maintaining pointers to Internet resources. The gopher technology does this with the use of "link" files. HTTP employs a markup language called the hypertext markup language (HTML). Since FTP sites are Internet resources, and since Mr. Serials has both a gopher and HTTP server in place, gopher link files and HTML files are created pointing to the collection of electronic serials.

Gopher and HTTP solve a number of problems. First, gopher link files and HTML documents provide the ability to list files within an FTP archive in a more readable fashion. Second, both of these technologies eliminate the need for the end-user to know and understand FTP commands, enhancing and simplifying access to the collection. Furthermore, the link files and HTML documents present information in greater detail than the simple FTP file names, allowing the end-user to better evaluate menu choices.

For example, to facilitate access to the collection with a gopher server, link files are created pointing to texts in the FTP archive. Below is a correctly formatted gopher link file:

Name=ALAWON v1n35 (May 5, 1992)
Type=0
Port=70
Path=ftp:dewey.lib.ncsu.edu@/pub/stacks/alawon/alawon-v1n35
Host=dewey.lib.ncsu.edu 
		

where:

When end-users use a gopher client to access the collection of ALAWON they see "ALAWON v1n35 (May 5, 1992)". This is more user-friendly than "alawon-v1n35". By selecting the gopher link the gopher server:

  1. Reads the link file
  2. Transparently opens up an anonymous FTP connection to dewey.lib.ncsu.edu
  3. Changes to the /pub/stacks/alawon directory
  4. Retrieves the file "alawon-v1n35" as a text (ASCII) file
  5. Closes the connection
  6. Displays the file to the end-user

From there the end-user can read, save, or print the file.

Similarly, a gopher link file for a particular article from the Public Access Computer Systems Review would look like this:

Name=Caplan, 'You Can't Get There From Here: E-prints and the Library'
Type=0
Port=70
Path=ftp:ftp.lib.ncsu.edu@/pub/stacks/pacsr/pr-v5n01-caplan-you
Host=dewey.lib.ncsu.edu 
		

The end-user is presented with a menu item reading "Caplan, 'You Can't Get There From Here: E-prints and the Library'" which again makes more sense than "pr-v5n01-caplan-you".

Like the FTP archive, the gopher server was set up in such a way that every electronic serial title had its own directory. Unlike the FTP archive, each of these directories was further subdivided by volume, number, or year depending on the designations used by the electronic serial. [8]

Similar to gopher link files, HTTP servers require HTML documents to point end-users to Internet resources. Consequently, if Mr. Serials is to provide access to the collection of electronic serials, then Mr. Serials must facilitate the creation of HTML documents just as it facilitates the creation of gopher link files. Figure 2 lists an HTML document describing the locations of articles in volume 5, number 6 of the Public Access Computer Systems Review.

Figure 2

This figure, automatically created by Mr. Serials, is an HTML document listing the articles available from volume 5, number 6 of the Public Access Computer Systems Review as located in the electronic FTP archive of the North Carolina State University Libraries.

Notice how each article from the issue is listed and associated with a URL. When end-users view this document with a WWW client application, they are presented with a list of articles from this particular issue. When end-users select a particular article, their WWW client application automatically retrieves the associated file from the FTP archive.


<HTML>

 <head>
  <title>
  Public Access Computer Systems Review Volume 5 Number 6 (1994)
  </title> 
 </head>
 <body>
  <h2>
  Public Access Computer Systems Review Volume 5 Number 6 (1994)
  </h2> 
  <ul>
   <li>
    <a href="ftp://ftp.lib.ncsu.edu/pub/stacks/pacsr/pr-v5n06-contents">
    Public Access Computer Systems Review Table Of Contents v5n06 (1994)
    </a> 
   <li>
    <a href="ftp://ftp.lib.ncsu.edu/pub/stacks/pacsr/pr-v5n06-crawford-and">
    Crawford, 'And Only Half of What You See, Part III: I Heard It Through
    the Internet', Public Access Computer Systems Review v5n06
    </a> 
   <li>
    <a href="ftp://ftp.lib.ncsu.edu/pub/stacks/pacsr/pr-v5n06-morgan-worldwide">
    Morgan, 'World-Wide Web and Mosaic: An Overview for Librarians', Public
    Access Computer Systems Review v5n06
    </a> 
  </ul>
 <hr>
 <p>This file was last updated October 19, 1994.</p>
 </body>
</html>
 
			

When end-users browse this particular document from the WWW server at dewey.lib.ncsu.edu, they are presented with a formatted page where they can select a link like "Crawford, 'And Only Half of What You See, Part III: I Heard It Through the Internet', Public Access Computer Systems Review v5n06". Upon doing so, the end-user's WWW browsing software initializes an FTP connection to ftp.lib.ncsu.edu, retrieves the file pr-v5n06-crawford-and, and displays it to the end-users.

Unlike the gopher and FTP directory structures, there is no need for a separate directory for each electronic serial. This function is handled quite nicely since each HTML file has a unique name such as pr-v5n06.html, and these files can be grouped together using another HTML file that points to them individually. For example, such a file may be named "pr-index.html" and contain pointers to pr-v5n06.html which in turn points to pr-v5n06-crawford-and. [9]

WAIS for Keyword/Searching Access

Mr. Serials uses FTP primarily for storage and rudimentary access. Gopher and HTTP enhance access and organize the information. Unfortunately FTP, gopher, and HTTP are only useful if end-users know exactly which issue or article they want or if they want to browse the collection. The WAIS technology allows end-users to search the indexed collection by keyword.

The concept of WAIS servers is not widely understood. WAIS is an indexing and document delivery mechanism. [10] Using WAIS, a computer can index a collection of data (simple text files, email digests, graphics, binary files, etc.) thus creating lists of terms or file names pointing to the original documents. These lists of terms or file names can then be searched using Boolean logic. The results of searches are returned in a ranked order according to "relevance". Briefly, this relevance ranking compares the number of times a term(s) appears in a particular document, the number of times a term(s) appears in the entire set of indexed data, and the length of the documents. The end result is a numeric value being assigned to each document containing the terms in the query. Once the results of a query are returned to the end-user, s/he can select items from the results and retrieve the corresponding document(s).

Once a WAIS server is put in place, two things must be taken into consideration, data integrity and searching implementation. First, the WAIS server's index is kept up-to-date (integrity) with the aid of shell scripts. [11] These scripts re-index the entire collection every day at 3:30 A.M. Thus, access to the collection of electronic serials is never more than one day out-of-date. Second, to facilitate the actual searching, gopher link and HTML files must be written. A gopher link file allowing an end-user to search the collection of ALAWON looks like this:


Name=Search ALAWON (freeWAIS)
Numb=1
Type=7
Port=70
Path=waissrc:/.wais/alawon.src
Host=gopher.lib.ncsu.edu
 
		

When the end-user browses the ALAWON collection, one of the menu choices is "Search ALAWON (freeWAIS)". [12] Upon selecting this item, the end-user is presented with a blank field for constructing and entering a query. After completing the query, the gopher server invokes a WAIS search and returns the results to the end-user whereupon s/he has the opportunity to select an item from a list and view the desired documents. Figure 3 illustrates the same functionality put into an HTML document. When the end-user selects the item named "Search ALAWON" from the WWW-based collection of ALAWON, s/he is presented with a field to enter a search term. After submitting the query, the HTTP server passes the query onto the WAIS server. The WAIS server returns the results and the end-user can select items from the results to retrieve the actual document(s). [13]

Figure 3

This figure shows keyword searching access to the collection of ALAWON at the North Carolina State University Libraries. This HTML document provides a field where the end-user can enter a Boolean query. The results returned from the query are then "hot" enabling the searcher to retrieve entire issues from the collection of ALAWON.


<html>
<head>
	<title>Search ALAWON</title> 
</head>
<body>
<h2>Search ALAWON</h2>
<p>This service requires a <a
href="HTTP:/forms-browsers.html">forms-based WWW
browser</a>. A selection of these browsers is
available. This page allows you to search the back
issues of ALAWON. Use the form below to enter a
Boolean query. More <a
href="HTTP:/wais-instructions.html">detailed
instructions</a> for entering queries is available
online.</p>
<hr>
<form method="POST" action="HTTP://www.lib.ncsu.edu/cgi-bin/SFgate">
	<input type="hidden" name="database" value="vega.lib.ncsu.edu/alawon" />
	Enter your query. 
    <!-- The following line creates a field for searching input -->
	<textarea name="text" rows="2" cols="60">
	</textarea>
	<input type="Submit" value="OK"> <input type="reset" value="Reset" /> 
</form>
</body>
</html>
 
			

Bibliographic Control

Until mid-1994 the Mr. Serials Process ignored the problem of bibliographic control (cataloging) of the electronic serial collection. At the 1994 Annual Meeting of North American Serials Interest Group (NASIG), there was much discussion about the newly proposed MARC 856 field. This field is intended to describe the locations and holdings of electronic documents. It has provisions for such information as the name of remote files, the operating system of the remote computer, the protocol used to communicate with the remote computer (FTP, telnet, or other), the directory where the remote file resides, etc. Information about the 856 field can be found in "Proposal No: 93-4", USMARC Format: Proposed Changes 1993, No. 2 prepared by the Network Development and MARC Standards Office. [14] We at NCSU therefore decided to take two serials from the collection and include a Uniform Resource Locator (URL) in an 856 field. This way end-users could search the NCSU online public access catalog, select the URL from the screen, and paste it into their favorite WWW browser.

At the same time Tim Kambitsch (then of Butler University) was working on some scripts to search DRA OPACs with WWW browsers. These forms-based scripts allowed the end-user to specify Boolean queries to be applied to selected databases (book and journal catalogs, catalogs of government documents, and bibliographic indexes like Expanded Academic Index, Business Index, or Newspaper Index). Using Kambitsch's scripts it was now possible to search the OPAC with a WWW browser, and since it was possible to list access points to electronic items in the catalog, the next logical step was to provide hypertext links from the catalog to the electronic item itself. This is just what we did.

As an experiment, two MARC records from OCLC (OCLC record numbers 26226155 and 20987125) were downloaded and added to the NCSU Libraries OPAC. These records describe ALAWON and the Public Access Computer Systems Review , respectively. The records were then edited to include URLs (marked up in HTML) in 856 fields. Thus, using a WWW browser to search the OPAC, the end-user then has the opportunity to navigate directly to the electronic resource after locating items of interest. [15]

Automating Acquisitions with Ac

Once there were mechanisms in place for organizing, archiving, and disseminating the electronic serials, a mechanism was needed for the acquisitions process. This process needs to:

  1. Identify incoming electronic serials as a particular title
  2. Extract bibliographic information from the received item
  3. Update the FTP, gopher and WWW servers accordingly

Identifying incoming titles is done with a piece of software called procmail. [16] After installing procmail, "recipes" are used to listen to an account's incoming email. These recipes, once identifying a piece of email, process the incoming email in any one of a number of ways. The Mr. Serials Process uses these recipes to save the email in a specified directory with a name corresponding to the serial codes described previously. For example, the following recipe handles incoming issues of ALAWON:


MSGPREFIX=alai
:c
* ^From.*alawash@alawash.org
process-us
:a
| formail -x Subject: | mail eric_morgan@ncsu.edu
 
		

The first line specifies whether a file will be saved. If so it will have a prefix of "alai". The second line tells procmail to begin processing a new recipe. The third line searches the "From" line of the email header for "alawash@alawash.org". If the search is successful, then procmail saves the entire message in a directory named process-us. The next line instructs procmail to continue processing. Finally, the last line of the recipe extracts the subject line of the email message and forwards it to eric_morgan@ncsu.edu. This last step is used to notify the maintainer of the electronic serial collection that there are new issues or articles ready for processing. It should be noted that for every electronic serial, there is a corresponding recipe whose format looks very much like ALAWON's.

After receiving notification of new issues to process, the maintainer of the collection runs a program called ac (short for acquisitions) to extract the items's bibliographic information and update one or the other of the information servers. [17] Ac, written in a scripting language called perl, is the only original piece of programming in the entire Mr. Serials Process; every other piece of software used in the Mr. Serials Process was obtained from the Internet and written by someone else.

In a nutshell, this is how ac works. It:

  1. Gets a list of the serials to be processed from the process-us directory
  2. Reads a serial and extracts the bibliographic information (volume, number, date, author, and title)
  3. Replaces the issue's email header with the extracted bibliographic information as well as a URL denoting the future location of the serial in the FTP archive
  4. Saves the edited serial in the FTP archive
  5. Creates and saves a gopher link file or HTML document and updates the servers as appropriate
  6. Deletes the original email message
  7. Repeats the process until all issues are processed

The beauty of ac lies in its ability to extract the bibliographic information from electronic serials: volume, number, issue, date, etc. The format of many electronic serials is stable from issue to issue. In other words,for a particular serial the volume information seems to always be on the fifth line down and the fourth word from the left. Given what seemed to be consistent patterns, we originally hard coded the locations of the bibliographic information into the program, allowing ac to extract words from the text to build a citation.

Unfortunately, experience provided that the placement of bibliographic data was not always consistent. On occasion, it turned out that our volume information normally found on the fifth line was in fact on the fourth line. To handle such inconsistencies, we abandoned hard coding. Instead, we created "description files," simple text files that denote the present or absence of bibliographic information in a particular serial as well as the location of this data. Thus, ac now reads the description files for the position of bibliographic information.

Ac can not extract multi-word bibliographic information like authors and titles. To accomplish this task, the maintainer of the electronic serial collection must identify this information with the aid of "copy and paste" functions of the maintainer's communications software. Ac can create email messages on behalf of the collection's maintainer in the cases where articles must be ordered from table of contents issues, but ac can not extract the titles of the articles automatically.

Evaluation

Between Mr. Serials' inception in early 1992 until November 1994, Mr. Serials processed 4,197 files from 37 titles. The entire collection, including indexing, is just under 165 megabytes in size. (See Figure 1 for more detailed information.) This entire electronic serial collection has been maintained by one individual. At the present time, there is not enough data to study the use of the collection; a use study is a topic for further examination.

Objective studies have not been used to analyze the ease in which Mr. Serials processes electronic serials, but subjective observations suggest it takes less than 15 seconds to process individual issues, if the issue's description files correspond to the issue's formatting. If the formatting of the issue in question has altered, then the time it takes to process an issue is about 2 minutes. The addition of a new title to the collection requires creating and editing a number of text files. This aspect of Mr. Serials requires no more than 30 minutes for the experienced maintainer of the collection.

Early in the development of Mr. Serials, an unexpected and reassuring opportunity presented itself. Since Mr. Serials was developing a comprehensive collection of electronic serials in the field of librarianship and information science, an index of the entire collection was created containing all the library and information science titles in the collection. This index is analogous to the print index Library Literature. Consequently, not only can end-users search the collection of the Public Access Computer Systems Review for the articles containing the term "nren", but they can search for the term "nren" from the entire collection and retrieve articles from ALAWON, IRList Digest, Current Cites, etc. as well. This new index has been named Electronic Library Literature. [18] Think of the possibilities! Institutions could use a program like the Mr. Serials Process to collect, organize, archive, index, and disseminate serials from many disciplines (engineering, biology, computer science, literature, etc.). Each of these collections could then be indexed creating supplements to their printed counterparts. New technologies have presented librarians with new opportunities.

While the Mr. Serials Process is functional, there are a number of aspects concerning its functionality that could be improved. Its biggest drawback concerns its inability to handle inconsistent formatting. With the aid of another program (Mr. Serials' Helper), description files can be created and recreated by selecting bibliographic information from the screen and clicking buttons. The recreation of description files is not difficult, just tedious. The recreation of these description files is the most time-consuming aspect of Mr. Serials.

A potential solution to this problem is to enhance the ac program to include a description file editor. Alternatively, ac could be enhanced to guess the locations of bibliographic elements if the extracted data is incorrect. For serials which are truly consistent from issue to issue, the procmail aspect of Mr. Serials can be used to automatically update the archive, thus alleviating the need for human intervention. This entire problem could be eliminated if electronic serial publishers were to adopt a standard header for their texts and include all the bibliographic information necessary to uniquely describe the electronic serial's issue and/or article. Ironically, such a standard already exists in "The TEI Header" of TEI P3: Guidelines for Electronic Text Encoding and Interchange. [19] This header is based on the Standard Generalized Markup Language (SGML). Unfortunately, it is intended to be interpreted by SGML rendering software, but to humans it reads like un-rendered HTML.

Presently, Mr. Serials can only process electronic serials delivered via email. Electronic serials originally distributed from FTP, gopher, or WWW servers are unavailable to Mr. Serials. On the Internet a few programs, called robots, spiders and mirrors, exist which regularly extract information from remote Internet servers. This same technique could eventually be applied to the Mr. Serials Process.

Even taking into consideration the need for such improvements to Mr. Serials, the first goal of the Mr. Serials Process, "to discover an automated method for collecting, organizing, archiving, indexing, and disseminating electronic serials," has been accomplished.

Only after a number of barriers have been removed can the second goal of Mr. Serials be accomplished, ie. to demonstrate to the scholarly community that if faculty publish their findings in free, peer-reviewed Internet-based journals, then libraries can effectively supplement the scholarly communications process.

The reluctance of the scholarly community to accept electronic publications as valid material for tenure consideration is one such barrier. Fortunately more electronic, scholarly, peer-reviewed journals are being published every year. [20]

The effective, wide spread use of the Mr. Serials concept by libraries can increase the value of electronic serials. Suppose a number of libraries were committed to creating comprehensive collections of electronic serials within specific academic disciplines. Using Mr. Serials, these libraries could collect, organize, archive, index, and disseminate their collections relatively easily using the Internet. Since the electronic serials are organized into one "space" and accessible via a number of means, the value of the entire collection increases because of its ease-of-use. Since libraries are committed to archiving materials for future generations electronic serials will grow in value and provide the fuel for further research.

Another barrier to accomplishing the second goal of Mr. Serials is the lack of staff resources necessary to make it work. The Mr. Serials Process eliminates many hardware dependencies; the software aspects of the Mr. Serials Process can be used on just about any computing platform, albeit with adjustments for operating system requirements (file naming conventions and path specifications). Consequently, hardware, while a necessary part of the Mr. Serials Process, is not a great limiting factor. Rather, the lack of staff resources and sufficient knowledge of computing is much more of an impediment. Libraries are not about books but information. Computers are excellent information tools. Therefore, next to a librarian's mind, I see computers as the primary tool of librarianship. I believe when librarians realize the value of their own creativity and the power of the computer, tasks like Mr. Serials will open up new avenues in scholarly communication.

Glossary

anonymous FTP
A procedure where an end-user is authorized to copy files to or from a remote computer. Usually end-users must log on to the remote computer hosting the FTP service as "anonymous" and supply their email address as the password.
client/server computing
A method of dividing computing tasks between two applications. Client/server computing is made of two components: the client and the server. The client's primary responsibility is to handle the end-user interface, translate the requests into the protocol of the server, and send the request to the server. The server's responsibility is to receive requests from client applications, apply these requests to a set of data, and return the results of the request back to the client. Many Internet applications like gopher, HTTP, and WAIS rely on the client/server model of computing.
gopher
A client/server application and protocol developed at the University of Minnesota in 1992. It is used for the purposes of seamlessly delivering information to end-users over the Internet. By incorporating features of other Internet protocols like FTP and WAIS, gopher servers eliminated the need for end-users to remember the obscure commands of many Internet-based programs.
FTP
An abbreviation for File Transfer Protocol. FTP is a protocol allowing for the copying of files from one computer to another over the Internet. It also supports rudimentary directory command giving the end-user the ability to navigate a directory structure as well as create and delete directories.
FTP archive
A collection of files made available on the Internet through the FTP protocol.
HTML
An abbreviation for hypertext markup language. A subset of SGML, HTML is primarily used by HTTP servers to distribute information. Like SGML, it is used to logically describe the contents of an electronic document. Additionally, HTML is used to implement the notion of "hypertext" where a reader can select items from a document and "link" to other documents. Extending the hypertext concept, URLs can be embedded within HTML documents allowing the reader to retrieve other documents and files from other Internet resources.
HTTP
An abbreviation for Hypertext Transfer Protocol. HTTP, like gopher, is a protocol intended for the purposes of sharing information over the Internet. It was developed by Tim Burners-Lee in early 1990. Unlike the gopher protocol, HTTP servers do not act as intermediaries for HTTP client applications. Instead, HTTP servers deliver HTML documents to client applications. These HTML documents include URLs which the client applications then use to retrieve information from Internet resources themselves.
link files
On gopher servers, link files are used to enumerate the name, type, and location of remote Internet resources. The contents of link files are sent to gopher client applications whereupon end-users can select an item from a menu and thus use or retrieve remote Internet resources.
procmail
A specific mail filtering application for Unix-based computers.
protocol
A language or specification used to communicate between to computing applications.
SGML
An abbreviation for standard generalized markup language. SGML is a standard used to logically describe the contents an electronic document.
shell scripts
Analogous to batch files of the DOS operating system or DCL files of the VMS operating system, shell scripts are typically short programs made up of Unix operating system commands.
URL
An abbreviation for Uniform Resource Locator. URLs are used to unambiguously describe the location of Internet resources. They usually take the form of scheme://host/path where: 1) "scheme" is a protocol designation like gopher, http, or ftp, 2) "host" is the name or number of a remote computer, and 3) "path" is the designation of a file on the remote computer.
WWW
An abbreviation for World Wide Web. WWW is often used synonymously with HTTP.
WAIS
An abbreviation for Wide Area Information Server. WAIS, initially co-developed by Thinking Machines, Inc., Apple Computer, and Dow-Jones in 1992 is an indexing/document delivery technology. It is made up of three components: the indexer, the client, and the server. The indexer evaluates sets of electronic documents and creates lists of terms pointing to those documents (an index). The client is an end-user interface to the indexed data allowing the end-user to specify queries to be sent to a server. The server accepts queries from WAIS client applications, applies them to the index and returns the results whereupon the client application allows the end-user to select items from the returned results to finally retrieve a document. Since 1992, many people and organizations have improved upon WAIS. These improvements include Boolean searching, right-hand truncation, nested queries, field searching, and better relevance ranking.

Notes

  1. This article is also available electronically via the Uniform Resource Locator (URL) http://www.lib.ncsu.edu/staff/morgan/report-on-mr-serials.html . The purpose of URLs is to unambiguously describe the whereabouts of Internet resources. They are most frequently employed by World Wide Web (WWW) browsing software like Netscape, Mosaic, or Lynx. All WWW browsers implement a command to "open a URL". Use the "open a URL" command in your WWW browser to retrieve and/or display many of the notes from this article. The electronic version of this article has the links embedded in the text. For more information about URLs try Tim Berners-Lee, "WWW Names and Addresses, URIs, URLs, URNs" at http://info.cern.ch/hypertext/ WWW/Addressing/Addressing.html or Eric Lease Morgan, "The World-Wide Web and Mosaic: An Overview for Librarians", The Public-Access Computer Systems Review 5, no. 6 (1994): 5-26 at http://www.lib.ncsu.edu/staff/morgan/www-and-libraries.html#uniform .
  2. Okerson, Ann. "The Electronic Journal: What, Whence, and When?" The Public-Access Computer Systems Review 2, no. 1 (1991): 5-24.
  3. Richard M. Dougherty, "To Meet the Crisis in Journal Costs, Universities Must Reassert Their Role in Scholarly Publishing," Chronicle of Higher Education, 12 April 1989, A52.
  4. Much of this work is documented in Eric Lease Morgan, WAIS and gopher Servers: A Guide for Internet End-Users (Westport, CT: Mecklermedia, 1994)
  5. Originally Charles W. Bailey, Jr., "Library-Oriented Lists and Electronic Serials" at ftp://ftp.lib.ncsu.edu/pub/stacks/guides/bailey-library.txt was used. A more up-to-date guide includes Steve Bonario and Ann Thornton, "Library-Oriented Lists and Electronic Serials" at gopher://una.hh.lib.umich.edu/00/inetdirsstacks/library%3abailey .
  6. The inner workings of the ListManager are described in Eric Lease Morgan, "Implementing TCP/IP communications with HyperCard", Information Technology and Libraries 11(4):421-432, December 1992 at http://infomotions.com/musings/tcp-communications/ . The software itself can be found at ftp://ftp.lib.ncsu.edu/pub/software/mac/listmanager.hqx .
  7. Try ftp://ftp.lib.ncsu.edu/pub/stacks/ to review the organization within the FTP archive.
  8. There is no single place to search and browse the collection of electronic serials made available by Mr. Serials, but the best URLs pointing to example serials include gopher://gopher.lib.ncsu.edu/11/library/disciplines/library/ejournals , http://www.lib.ncsu.edu/disciplines/library-Newsletters.html , and/or http://www.lib.ncsu.edu/disciplines/library-Scholarly.html .
  9. Like the gopher server, the electronic texts are widely distributed within the NCSU Libraries "Webbed" Information system. Access http://www.lib.ncsu.edu/stacks/stacks-Newsletters.html to get a flavor of how it works.
  10. There are many examples of the WAIS technology including WAIS, Inc.'s distribution ( http://www.wais.com/ ), the Clearinghouse for Networked Information Discovery and Retrieval's freeWAIS ( http://cnidr.org/cnidr_projects/freewais.html ), and Ulrich Pfeifer's freeWAIS-sf ( http://ls6-www.informatik.uni-dortmund.de/freeWAIS-sf/ ).
  11. The following shell script is used to re-index the collection of ALAWON:
    
    if "$1" == "alawon" then set theIndex=/usr/local/gopher/data/.wais/$1
        set theDirectory=/usr/local/ftp/pub/stacks/$1
        $wi -d $theIndex -r -t first_line -export $theDirectory
        exit
    endif
     
    				
    In a nutshell, it first defines the index as alawon. It then specifies the location of the index once it is created. Finally, it reindex the ALAWON texts using the first_line type of indexing (indexing where the first line of the files will be returned when a search identifies that article as relevant). Other shell scripts are used to re-index other electronic serials. Those scripts are much like this one.
  12. See gopher://gopher.lib.ncsu.edu:70/7waissrc%3A/.wais/alawon.src .
  13. See http://www.lib.ncsu.edu/stacks/alawon-wais.html .
  14. See gopher://marvel.loc.gov:70/00/.listarch/usmarc/93-4.doc .
  15. To see the results of these labors in action:
    1. Use your WWW browser to access http://library.ncsu.edu/ .
    2. Choose either the forms-based or non-forms-based searching methods.
    3. Search for: "alawon or "public access computer systems review ".
    4. Display the results in "full" or "MARC" format.
    5. Look for the links in the resulting texts and give them a try.
  16. Procmail can be found at ftp://ftp.informatik.rwth-aachen.de/pub/packages/procmail/ .
  17. Incomplete documentation for ac can be found at http://www.lib.ncsu.edu/staff/morgan/ac.html .
  18. Electronic Library Literature can be found at gopher://gopher.lib.ncsu.edu/11/library/disciplines/library/ell .
  19. A description of the header itself can be found at ftp://ftp.ifi.uio.no/pub/SGML/TEI/P3HD.DOC . The Guidelines can be found at http://www.lysator.liu.se/runeberg/teip3files.html .
  20. Ann Okerson,"Introduction to the 1994 Directory" 1994 ARL Directory of Electronic Journals and Newsletters (Washington, DC: Association of Research Libraries, 1994) at gopher://arl.cni.org/00/scomm/edir/eintro .

Creator: Eric Lease Morgan <eric_morgan@infomotions.com>
Source: This article also appears in Serials Review 21 no. 4 (Winter 1995): 1-12.
Date created: 1995-12-15
Date updated: 2004-11-17
Subject(s): electronic journals; articles;
URL: http://infomotions.com/musings/serials/