Using World Wide Web and WAIS technologies

This is the written compliment to a presentation given at the 1995 USAIN Annual Conference held in Lexington, KY, April 26-29. The goal of the presentation is to describe three qualities (readability, browsability, and searchability) of useful information systems and outline how they can be manifested in World Wide Web servers using HTML, database applications, and the WAIS technologies.

Information systems

At their very heart, World Wide Web (WWW) servers (technically known as hypertext transfer protocol or HTTP servers) are tools for disseminating information. Created effectively, WWW servers can be information systems. For our purposes, an information system is broadly defined as any collection of information. Consequently, a book can be described as an information system, as well as a library, an advertisement, or even the dashboard of your car. Similarly, things like gopher, FTP, and WWW servers too can be defined as information systems.

With everybody creating information systems these days, there is a need for some guidelines describing qualities of effective information systems. In my opinion, these guidelines can be distilled into three qualities:

Readability
Browsability
Searchability

All of these qualities (readability, browsability, and searchability) do not have to be equally represented in every information system. As a collection of information increases, different aspects of these qualities take on greater significance. Thus, the amount of readability, browsability, and searchability an information system exhibits depends on the type and quality of the collected data, as well as the information needs of the clientele.

Readability means good visual design

All information systems, no matter how small must incorporate principles of good graphic design. You and your information system are competing with a myriad of other information systems. If your data is not presented in a visually appealing, easy-to-read manner, then your chances of retaining the attention of your intended audience are significantly reduced. Try to follow these guidelines:

Use a consistent layout
White space is good
Visually organize the page; employ horizontal rules
Keep pages short
Include elements of contrast
Use all stylistic elements in moderation

Browsability connotes logical organization

As the size of an information system grows, so does the need to logically organize its data. This implies grouping conceptual sets of data with similar conceptual sets of data. Browsability becomes more effective when it is coupled with hypertext and logical groupings of information.

Technology and culture change, and this change means any logical organization of knowledge and information must change also. Despite the dynamic nature of logical groupings of information, the organization of information and knowledge seem to be a necessary part of human existence. Since the primary purpose of information servers is to disseminate knowledge, facts, and ideas, it then follows the information they disseminate must be organized in some reasonable fashion.

A browsable information system has a number of advantages.

Readers see entire system at a glance
Knowledge of a vocabulary is not necessary
Like items are grouped together
Easy to navigate
Fosters serendipity
Stimulates thinking

But a solely browsable system is not without its disadvantages as well.

Easy to get "lost"
Classification system may be foreign to reader
Classification breaks down as quantity of information increases
Classification changes over time

The creation of useful browsable will be better implemented if you:

Know your audience
Provide "about" texts
Use the vocabulary of your intended audience
Create a hierarchal system of ideas
Create a system that is both flexible and exhaustive
Classify by format last

Searchability addresses specific needs

The largest of information systems must include search features. These features help overcome the disadvantages of the purely browsable system. As described in the previous section, your conception of the information universe is not necessarily the same as your reader's. While you try to group things in the most logical manner, your reader's "logic" may very well be different from yours. Searchability can help over come this discrepancy by allowing the reader to create their own set of logically similar items.

Thus, searchability allows the end-user to:

Create alternative logical classifications
Simplify the location of known items
Work independently of collection size

Yet, purely searchable systems are not perfect either. To be used to their fullest extent, the end-user:

Must know searching syntax
Must articulate a preconceived information need, idea, phrase, or term
Must know the structure of the data

All hope is not lost. Try to follow these guidelines:

Include help texts
Map located items to similar items
Provide simple as well as "power user" search mechanisms

Implementing the qualities on a WWW server

The three qualities outlined above (readability, browsability, and searchability) can be implemented on just about any WWW server. For example, the aspects of readability are best (only) employed by exploiting the features of the hypertext markup language (HTML). This can be done on any WWW server. While it is true that HTML can not compare to the power and creative flexibility of today's desktop publishing systems, nor is the client/server computing nature of the WWW servers conducive to consistent page layout, features of good page design can be implemented using HTML. The number of style guides that have appeared since the inception of the WWW and HTML are a testament to this idea. The one written by Patrick J. Lynch seems to be the most scholarly in nature. You are encouraged to read these guides for further information.

Browsability (logical organization) can be done by hand using HTML, but if you are creating an information system of any size, then the use of a database program to manage your data is strongly suggested. This is because the creation of HTML documents can be extraordinarily tedious. Furthermore, if you have information sources belonging to more than one section of your information system, then when you make a change to one section you may have to make a similar modification to another. These aspects of HTML creation introduce greater possibilities for data-entry error.

On the other hand, if you employ the use of a database program to manage the information on your system, then you can reduce the chances for error. The most important feature of the database program is whether or not it can output end-user designed text files. In other words, after you create your database, does the database program have the ability to create reports, in the form of HTML documents, that can be exported to ASCII files and then put into your WWW server's data directories? The answer to this question is almost always "Yes." For microcomputers, applications like FileMaker Pro or FoxPro will work well. If these do not suit your needs, then a simple database program could written in HyperTalk (Macintosh) or VisualBasic for Microsoft Windows (Intel-based computers). Database programs for Unix- or VMS-based computers maybe more difficult to find or create. Once a database is in place, fields will be created used to describe your data. These fields may include things like: name, title, URL, abstract, and subjects. After the data-entry is taken care of, you create a report based on the contents of your database. This report analyzes the records in the database and creates HTML files as output. Using this method will not only make the creation of your HTML easier, but you will also fulfill a requirement of readability ("Use a consistent layout") since all your pages will have a similar "look and feel."

Providing searching services for your information system is the most difficult quality to implement. One way to fulfill this requirement is through the database program used to manage the system's data. To do this, common gateway interface (CGI) scripts must be written or employed. These CGI scripts, getting their input from an HTML form, construct database queries. These queries are then passed on to the database application for processing. The results of the database search are then returned to the CGI script and formatted into HTML. Finally, the HTML formatted search results are passed from the CGI script back to the WWW server, in turn, passing the results back to the client application. This scenario will work for any programmable database application. Considering the present state of the WWW technology, this particular solution requires a lot of programming.

Another alternative is to directly search or create an index of the HTML documents making up your information system. For the Macintosh operating system, TR-WWW and AppleSearch/ AppleWebSearch provide these sorts of services, respectively. Unfortunately, the author has no experience with these sorts of tools for DOS/Windows- nor VMS-based computers. For Unix-based WWW servers, the Wide Area Information Server (WAIS) technology is can more than effectively be used to provide searching services for your information system.

Creating a WAIS server

Since its introduction, the WAIS technology has seen many improvements including the original distribution enhanced and supported by WAIS, Inc. , freeWAIS by the Center for Networked Information Discovery and Retrieval (CNIDR) , and Ulrich Pfeifer 's freeWAIS-sf .

Ulrich Pfeifer's freeWAIS-sf distribution represents the best value since it includes many enhancements not found in the other distributions and, more importantly, its free. Bringing up a WAIS server is not to difficult, but it does require new skills. Just like bringing up a WWW server, all that is really needed to the ability to read the instructions and some perseverance. In a nutshell, this is how it is done:

Download and uncompress the freeWAIS-sf archive
Read the instructions
Run the configure and make files
Index some sample data
Test
Permanently install the server
Regularly index your WWW server's HTML files

Because much of the necessary configurations needed to create the WAIS server application are handled by the configuration program, the creation of the WAIS server application has been reduced to the process of answering a few questions. If you have difficulties, then the best place to turn is the usenet newsgroup comp.infosytems.wais . There you will find people who have experienced the same problems you are experiencing and are willing to share their solutions with you.

Kidofwais.pl and/or SFGate

With a WAIS server in place, then next step is to provide a form for your end-users to send queries through to your WWW server to your indexed data. As mentioned before, this is done with CGI scripts. Fortunately, there are a number of scripts already written and no additional programming will be necessary. Of those scripts, two of them are described here, kidofwais.pl and SFGate.

Kidofwais.pl is a perl script. To use it you first create a WAIS index of your HTML documents using the "-t URL" indexing option. Since WAIS indexes return the full path names of indexed documents, and since you need to return URLs to your end-users, you must use the "-t URL" indexing feature. After indexing, you configure the kidofwais.pl script for your site. These configurations simply inform the script where the WAIS search engine resides and what indexes to search. Finally, you make kidofwais.pl available on your server through a URL. When this is done a client application can:

Read the URL of kidofwais.pl
The query is sent to the kidofwais.pl script
Kidofwais.pl searches your WAIS index
The results of the WAIS search are returned to kidofwais.pl
Kidofwais.pl converts the output into HTML
The results are returned to the client application

An example of a kidofwais.pl implementation is available on the NCSU Libraries "Webbed" Information System.

SFGate works in a similar manner but includes more features, is more customizable, and has a prettier output. On the other hand, it is a bit more difficult to install. Like kidofwais.pl, SFGate is a perl script. Unlike kidofwais.pl, SFGate is installed by running the configure program that comes with the distribution. You may have to run the configure program a number of times before you get SFGate successfully installed. This is because the questions it asks are not very clear. Once the installation was successful, you can edit the SFGate program itself to specify what database(s) will be searched and what type of output is desired. Finally, you make SFGate available on your server through a URL, and it will process queries much like kidofwais.

An example of an SFGate of implementation is also available a the NCSU Libraries "Webbed" Information System.

Summary

WWW servers are information systems. Information systems, to be useful, need to exemplify qualities of readability, browsability, and searchability. Within WWW servers, readability is best accomplished by exploiting the strengths of HTML. Browsability can be effectively implemented through the use of a database program to maintain your data. Searchability can be done by creating CGI scripts to search your database application and return the results, or it can be done my indexing your data with some sort of indexing technology like WAIS.

Creator: Eric Lease Morgan <eric_morgan@infomotions.com>
Source: Originally entitled "Using World Wide Web and WAIS Technologies to Create Electronic Information Systems." It is the written compliment to a presentation given at the 1995 USAIN Annual Conference held in Lexington, KY, April 26-29, 1995
Date created: 1995-04-01
Date updated: 2004-12-18
Subject(s): WAIS (Wide Area Information System); presentations; Lexington, KY; USAIN (United States Agriculture Information Network); information architecture;
URL: http://infomotions.com/musings/usain-95-talk/