Other Searching Services

Previous | Top | Next | Search | Comments

Other Services

There are many other search engines indexing and providing access to locally developed Internet information like WAIS, Glimpse/Harvest, LISTSERV and ListProc, etc.

There is more to Internet searching than HTML files. For example, "hosts" of information exist in the archives of USENET newsgroup postings and mailing list archives. There exist more than a few Internet services offering the ability to search newsgroup posting. There are fewer services allowing you to search mailing list archives. Real "news" like the sorts created by Associated Press and the major broadcasting networks is also available from quite a number of services.

Many search services found on the Internet employ "regular expressions." Regular expressions are a way to describing the shape of words as opposed to their meaning. Harvest (Glimpse) services as well as ListProc mailing lists rely on regular expressions for searching syntax. Regular expressions are also used the infamous grep program of Unix computers. Used effectively, grep can locate information lost on just about any file on your file system. The time spent learning regular expression syntax is not time lost.

You might also consider creating your own search engine. There exist quite a number of these tools. Some of them are as free as a free kitten. Other cost thousands and thousands of dollars. WAIS was one of the first to popularize the idea. It is still a strong tool. Harvest allows you to index the content of remote sites. OpenText and Verity's solutions are expensive, but offer features the other tools don't like technical support. ROADS is a nice piece of software designed for the collecting and indexing of Internet resources. If you are tired of maintaining your collections by hand, consider ROADS.

OTHER SERVICES

Title: Deja News
Remote HTML
Cost: 0
Ease of use: Easy
Data types: USENET newsgroup posting;
Description: Billed for business persons, this service collects and archives USENET posting and makes them available through a simple searching and browsing interface. Qualifying results is done through subsequent forms making the process longer, but possibly more efficient for the uniniated. One especially nice feature is it ability to limit searches to specific parts of the USENET heirarchy. This is done by first browsing the subject classifications and then searching. The service could be improved if the searching mechanisms were less form driven.

Title: Excite
Remote HTML
Cost: 0
Ease of use: Easy
Data types: HTML files; news; USENET newsgroup posting;
Search features: phrase searching; Boolean logic; relevance ranking; "concept extraction";
Description: Excite's search engine is much like everybody else's. Its power search form simply spells out the search features on a more descriptive manner. The features include phrase searching, Boolean logic, and relevance ranking. Excite uses the same "concept extraction" technique of locating documents a Magellan becauase Magellan is owned by Excite. The site also hosts pointers to other search engines as well as collections of Internet resources on popular subjects.

Title: Open Text
Remote HTML
Cost: 0
Ease of use: Easy
Data types: HTML files; USENET newsgroup posting; email addresses; current events;
Search features: field searching; Boolean logic; proximity;
Description: It's just a guess, but I believe OpenText's search engine is really an adveritisment for its commerical document storage and retrieval software. Like a number of the other services, OpenText's interface in entirely menu-driven. It consists of a number of blank fields surrounded by qualifiers. The qualifiers are of two types: fields and operators. The fields include: summary, title, first heading, URL, anywhere. The operators include: and, or, but not, near, followed by. It is obvious from this description that OpenText is strong on phrase searching and the proximity of words. Field searching is a bit weak.

Title: Regular expressions
Local HTML
Data types: ASCII text files;
Search features: regular expressions;
Description: Regular expressions represent the Unix way of locating text with in documents and they frequently raise their ugly head in many search interfaces. In a nutshell, regular expressions locate "patterns" of text within documents, not concepts; regular expressions help you locate data and information based on the shapes of words, not their meaning or relationships. The document cited above describes regular expressions in terms of the Unix grep command. An understanding of regular expressions will come in handy when using many alternative search tools like Harvest and ListProc mailing list archives.

Title: ROADS
Remote HTML
Cost: $0
Ease of use: Challenging
Search features: Boolean logic; field searching; nested queries; phrase searching; relevance ranking; truncation/stemming;
Description: ROADS is a suite of software whose purpose is to provide a means for collecting, organizing, and searching Internet resources. Requiring a Unix computer, the software is written completely in Perl and therefore completely open. Once installed, it provide the means for data entry into a database via HTML forms or simple ASCII text files. These forms/text files are flexible, structured records. Each record can have a unlimited number of fields and field lengths. Consequently it is easy to assign multiple subject heading to Internet resources. ROADS also provide the means for automatically creating HTML files for browsing as well as an interface for searching your collection. It doesn't stop there. The software also includes a built-in link checker so you can keep your data fresh. Lastly, it is possible to not only search your local data via the ROADS software, but you can search other ROADS collections via the same interface. Thus, it would be possible to create a single-user interface to global collections of Internet resources. This is the best free database software I've seen for collecting and maintaining Internet resources.

Author: Internet Research Task Force Research Group on Resource Discovery (IRTF-RD)
Title: Harvest
Remote HTML
Cost: 0
Ease of use: Easy
Data types: ASCII text files; full-text articles; HTML files; USENET newsgroup posting; image files;
Search features: Boolean logic; field searching; nested queries; phrase searching; regular expressions; truncation/stemming; spelling options;
Description: Harvest combines an Internet spider application with a database search engine (WAIS or Glimpse). Using this tool you can "gather" data from remote sites and have it indexed in a well-documented format. This gathered data can then be "brokered" to end users for searching. Its greatest strength is its ability is to combine multiple gatherers from local and remote sites and broker them from one location reducing network traffic. Its greatest limitation is its output features; the Harvest output is, generally speaking, difficult to display in a pleasing way interpret.

Author: Kotsikonas, Anastasios
Title: ListProc
Local HTML
Cost: $0
Ease of use: Easy
Data types: mailing lists;
Search features: phrase searching; regular expressions;
Description: ListProc was originally designed to be a mailing list server completely duplicating the functionality of LISTSERV except ListProc was to be run on Unix computers. Like LISTSERV mailing lists, ListProc mailing lists can be archived and searched if the mailing list administrator has turned on these features. Once turned on you can search the archives using regular expression syntax and "get" documents from the resulting search strategies. ListProc comes with an client that can be used to seach the archives interactively instead of via email, but few administators turn this feature on. Mailing lists are a good source of timely information and are good ways of identifying self-pronounced subject experts.

Author: L-Soft International, Inc.
Title: LISTSERV
Local HTML
Cost: $0
Ease of use: Challenging
Data types: mailing lists;
Search features: Boolean logic; date ranges; field searching; nested queries; phrase searching; truncation/stemming;
Description: People are the real sources of information, and as you may or may not know, the archives of many mailing lists are searchable. LISTSERV was the first real mailing list software to be put into wide spread use. Administrators of LISTSERV mailing lists may or may not turn on the archiving of mailing list distributions. Additionally, they may or may not turn on searching mechanisms for these archives. If these features are turned on, then you can search the archives of the mailing lists using either a WWW front-end suppplied by the administrator, or you can search the archives via email. The reference URL above points to a simple text file describing how to search LISTSERV mailing lists. For more information, see , the home page of the now-commerical version of LISTSERV.

Author: Pfeifer, Ulrich
Title: FreeWAIS-sf
Remote HTML
Cost: 0
Ease of use: Challenging
Data types: ASCII text files; HTML files; image files; image files;
Search features: Boolean logic; field searching; nested queries; relevance ranking; truncation/stemming;
Description: This application indexes text files and the file names of non-text files. It is a powerful application brought to maturity after spending much of its life in various software shops. If you wanted to index some of you data locally, and you had a Unix computer at your disposal, then this application would be one to consider doing the job.

Author: Verity
Title: Search '97
Remote HTML
Cost: $ Thousands
Ease of use: Challenging
Data types: ASCII text files; HTML files;
Search features: Boolean logic; field searching; nested queries; phrase searching; relevance ranking; truncation/stemming;
Description: If you want to provide your own searching interface to some of your local data, then you might consider purchasing Search '97. This software indexes local ASCII files (and therefore HTML source) as well as PDF and WYSIWYG data. If your source data is well structured, then Search '97 provides the means for powerful field searching. The software also includes in Internet spider/robot. This application allows you to "feed" the spider a set of URLs which it will then go out and index. Consequently, using Search '97 you could build your own Internet search service. More realistically, you could use this software to index the top N levels of the Internet resoruces you point to or you could index things like electronic serials. Search '97 is very much an "information science" sort of tool since much of its output is based on the statistical relevance of query terms. The software also uses these statistical methods to allow you to "find more like this one." Search '97 is a powerful tool, but not necessarily for the faint of heart.

Previous | Top | Next | Search | Comments

Version: 1.0.2
Last updated: 4/15/00. See the release notes.
Author: Eric Lease Morgan (eric_morgan@infomotions.com)
URL: http://www.infomotions.com/pointers/