|
Some services like AltaVista, Hotbot, Excite, DejaNews, and OPACs attempt to index and provide searching mechanisms for subsets of Internet-based information.
For the most part, all the Internet search engines provide the same sorts of features: Boolean logic, relevance ranking, and phrase searching. In fact, phrase searching and relevance ranking are these tool's strong points. They are designed so the end-user can put in a few words and get something (useful) out. The problem comes in when you have to know how to specify a phrase search. Do you surround your query with single quotes, double quotes, or does the engine default to phrase searching without any delimiters?
Relevance ranking made its first popular appearance with WAIS in 1991. Nobody ever understood exactly how it worked then, and few people understand how it works now. Most Internet search engines will not tell you how they determine relevance. Its "propriatory." In a nutshell, the services evaluate the number of times a particular term appears in a document and then compare that with the length of the document. This ratio determines relevance. Therefore, if a document was 1 word long and the document contained your single search term, then the located document is 100% relevant. Obviously, this sort of searching does little help when the theme of the document is never explicitly stated. Do we chalk this up to poor writing skills?
In general these services are weak on field searching and range qualifications. Remember, these tools are designed for popular culture, not necessarily power searches. Yet, the power search features of many of these services can be quite illuminating. They are usually implemented as more extensive forms for the end-user to fill out thus eliminating the need to know greater detail of the underlying database's structure.
Since each of the Internet search engines listed below are so similar, it is suggested you pick the service you like the most, read it instructions as thoroughly as possible, and use that service first moving to other services as needed. Personally, I like AltaVista the best since it has the most search features and since it indexes entire documents, not just parts of them.
|
LISTS OF SEARCH ENGINES
|
Title: Collection of All the Search Engines You Will Ever Need!
Remote HTML
Cost: 0
Ease of use: Mindless
Description: This is a simple list of 24 search forms allowing you to search various Internet indexes. Rudimentary.
Title: Feeality Searches & Links
Remote HTML
Cost: 0
Description: This is a simle list of search engine forms. Rudimentary.
Title: Librarians' Index to the Internet
Remote HTML
Cost: 0
Description: "The Librarians' Index to the Internet is a searchable, annotated, subject directory of close to 3,000 Internet resources chosen for their usefulness to the public library user's information needs." Originaly developed by a librarian, Carole Leita, this site has grown from a gopher bookmake file into what it is today. If you're nice to her, then maybe she will give you her Perl scripts and you can start your own collection.
Author: A2Z
Title: Internet indices, Directories, & How-To Guides
Remote HTML
Cost: 0
Description: This break-your-browser page contains Lycos's list of search engines. Unlike Excite's list, this list does offer explanations, but many of the resources seem dated.
Author: Drudge, Bob
Title: My Search Engines
Remote HTML
Cost: 0
Description: This is a list of at least 260 search engines from all over the 'Net. The engines are divided into subject areas and available via an alphabetical list as well. Like many of the other lists of search engines, My Search Engines contains forms directly searching each entry. It also provides pointers to few Internet searching tutorials and breifly describes how to use each tool.
Author: Excite
Title: Searching
Remote HTML
Cost: 0
Description: Here is Excite's collection of Internet search engines. No commentary. No explanations.
Author: Haskin, David
Title: Right Search Engine
Remote HTML
Description: "We tested six of the leading search engines: AltaVista, Excite HotBot, Infoseek, Lycos, and WebCrawler. We found that each can find an enormous amount of information, but a few are clearly superior in the way they home in on the most relevant information and in the interface they offer."
Author: HotWired, Inc.
Title: Wired Cybrarian
Remote HTML
Cost: 0
Description: The sidebar of this web page contains links to search engines and browsable lists of interest to academic librarians. The page sports the usual "wired" look with lots of colors, graphics, and extreme HTML. At the same time, this page concisely lists many of the more popular items for searching and browsing. The commentary is, at the very least, entertaining.
Author: Macmillan Publishing USA
Title: Search the Internet
Remote HTML
Description: This page lists and describes a few lesser known search engines, as well as a few golden oldies.
Author: McKinley Group, Inc.
Title: Search Engines
Remote HTML
Cost: 0
Description: Here is another list of search engines, but this one is ordered by Magellan's ranking system and reviews. Not a bad place to get an overview of searching mechanisms.
Author: McKinley Group, Inc.
Title: Searching the Web
Remote HTML
Cost: 0
Description: Here you will find reviews and links to resources evaluating resources. "Meta-data about meta-data?" Seriously, this site points the way to sites similar in purpose to the present guide.
Author: Mentor Marketing Services
Title: Eureka!
Remote HTML
Cost: 0
Ease of use: Easy
Description: This is a list of 46 search engines and standard forms for using them. This site also, briefly, describes each of the search engines.
Author: Netscape Communications, Inc.
Title: Netscape Net Search
Remote HTML
Cost: 0
Ease of use: Easy
Description: Using very sophisticated HTML and JavaScript, this page provide direct access to the major Internet indexes. Parts of each index are displayed in a window allowing direct data input. The page provides context sensitive help based on the currently selected index. The page is also (slightly) customizable. It also points to the most popular browsable lists.
Author: Nicholson, Scott
Title: AskScott
Remote HTML
Cost: 0
Ease of use: Easy
Description: "Just as the reference librarian in a library helps you find the best reference work for your search, AskScott helps you find the most appropriate Internet reference tool for your search."
This set of pages frames sets of browsable lists and Internet search engines with in the reference interview model. By asking themselves simple questions, a user of AskScott is lead to various Internet tools that may help provide the answers to their questions. The entire set of pages is also available for download and use in a local library. The whole idea is a good one and represents a fresh approach to locating information on the Internet.
Author: Zamboni
Title: Zamboni's Search Engines
Remote HTML
Cost: 0
Description: This is a very simple list of remote search engines with no explaination nor forms to complete.
|
INTERNET SEARCH ENGINES
|
Title: Excite
Remote HTML
Cost: 0
Ease of use: Easy
Data types: HTML files; news; USENET newsgroup posting;
Search features: phrase searching; Boolean logic; relevance ranking; "concept extraction";
Description: Excite's search engine is much like everybody else's. Its power search form simply spells out the search features on a more descriptive manner. The features include phrase searching, Boolean logic, and relevance ranking. Excite uses the same "concept extraction" technique of locating documents a Magellan becauase Magellan is owned by Excite. The site also hosts pointers to other search engines as well as collections of Internet resources on popular subjects.
Title: Hotbot
Remote HTML
Cost: 0
Ease of use: Easy
Data types: HTML files; image files; sounds; USENET newsgroup posting;
Search features: date ranges; field searching; phrase searching; Boolean logic; domain;
Description: HotBot uses an almost purely forms-driven interface to its search engine. A unique feature of HotBot is its ability to select a wide range to Internet media types to seek. For example, it includes: Image, Java, Acrobat, audio, JavaSript, VB Script, Video, ActiveX, Shockwave, VRML. Some these media types can be located in other services through the command line, but not all. HobBot also provides access to a number of specialized databases on the subjects of USENET, top news sites, classified ads, domain names, stocks, discussion groups, shareware, businesses, people, email addresses. HotBots browsable lists of Internet resources merely points to Wired's Cybrarian . While HotBot has a lot of features, feel a bit out of control when I am limited to only a forms interface.
Title: Infoseek
Remote HTML
Cost: 0
Ease of use: Easy
Data types: HTML files; news; USENET newsgroup posting;
Search features: phrase searching; Boolean logic; relevance ranking; domain; field searching;
Description: This site does not have "power" search feature, nor does it have complicated forms interface for completing queries. All queries are from a simple command line. The command line supports the usual suspects: Boolean operations, phrase searching, and a few field searches. Its collection of Internet resources looks just like everybody else's too. The only difference is they publish a collection policy; "Infoseek Select sites are chosen by our editors from the index based on their editorial value, traffic and the number of links to the sites." Like Yahoo, once you are browsing a particular subject area you can use the searching mechanisms to find resoruces from that area.
Title: Lycos
Remote HTML
Cost: 0
Ease of use: Easy
Data types: HTML files; sounds; image files; bibliographic citations;
Search features: phrase searching; Boolean logic; natural language;
Description: Starting out in academia (Carnegie Mellon University) and then going commerical, Lycos is the grand daddy of the Internet search engines. It is also the service that really built a name for itself by rating sites as the "Top 5%" and awarding sites the distinction of displaying the Top %5 badge. The serivce's search features rely greatly on variations of phrase searching, but the service also has a reputation of locating more hits than the other services. As evidenced by a visit to its home page, Lycos is becoming more and more like commercial serivce offering more than simple Internet searching and advertisements. Lycos also offers search links to Barnes and Noble as well as UPS. Obviously, this service is thinking the bigger picture and is quickly leaving academia behind.
Title: MetaCrawler
Remote HTML
Cost: 0
Data types: HTML files; USENET newsgroup posting; computer products;
Search features: domain;
Description: This is one of the true all-in-one search engines. By supplying MetaCrawler with query, MetaCrawler searches the most popular Internet indexes, collates the results, removes the duplicates, and displays them on your screen. Since the remote index's search features are diverse, the search features of MetaCrawler are a bit generic. You can limit your search by domains, type of data, and phrases (or not). This my very will be a service to keep an eye on.
Title: Northern Light Search
Remote HTML
Cost: $1-10
Ease of use: Easy
Data types: bibliographic citations; HTML files; full-text articles;
Search features: phrase searching; truncation/stemming;
Description: This a newcomer into the fray of Internet search engines. Like the resources described in the article "Searching the Hidden Internet" by Notess, this search engine not only provides access to a database of broad WWW documents, but articles from selected magazines and journals (Special Collections) as well. Presently the search features are weak. On the other hand, this engine analyzes the results of queries and creates groups of documents, clusters, much in the same way AltaVista searches are "refined." When you locate a document in the Special Collection you are interested in, you can pay as you go for the article or set up an account. Presently, articles range in price from $0-10 per copy.
Title: Open Text
Remote HTML
Cost: 0
Ease of use: Easy
Data types: HTML files; USENET newsgroup posting; email addresses; current events;
Search features: field searching; Boolean logic; proximity;
Description: It's just a guess, but I believe OpenText's search engine is really an adveritisment for its commerical document storage and retrieval software. Like a number of the other services, OpenText's interface in entirely menu-driven. It consists of a number of blank fields surrounded by qualifiers. The qualifiers are of two types: fields and operators. The fields include: summary, title, first heading, URL, anywhere. The operators include: and, or, but not, near, followed by. It is obvious from this description that OpenText is strong on phrase searching and the proximity of words. Field searching is a bit weak.
Author: Digital Equipment Corporation
Title: AltaVista
Remote HTML
Cost: 0
Ease of use: Challenging
Data types: HTML files; USENET newsgroup posting; image files;
Search features: Boolean logic; field searching; phrase searching; relevance ranking; truncation/stemming; nested queries;
Description: Of the free Internet search engines, AltaVista is the one with the most features and seemingly widest coverage. While the searching syntax is a bit obtuse, it is the most powerful and when used correctly can provide result sets that are easy to manage. Like a growing number of search engines, AltaVista employs statistical methods for refining the results of queries. When result sets are "refined" they are groups into categories based on the most frequently used words in the results. This method is similar to Verity's services as well as North Light Search. If you like AltaVista, then buy the book AltaVista Search Revolution and you will like it even more.
Author: McKinley Group, Inc.
Title: Magellan
Remote HTML
Cost: 0
Ease of use: Easy
Data types: HTML files;
Search features: Boolean logic; nested queries; phrase searching; relevance ranking; "concept extraction";
Description: The strength of this service lies in its Internet reviews. Using a 4 point scale, Magellan reviews Internet sites and ranks them in subject lists accordingly. Unfortunately, we don't know what their "collection management" policy is, so we can not determine how or why they have picked particluar items for the collection nor what they look for in particular resources. The search engine itself is divided into three parts: reviews, entire web, "green light". When you select reviews you only search the reviews. When you select the entire web, you search that. The green lights section are items that have been deemed appropriate for children. An uncommon feature of this service is its "concept extraction." This is billed as a sort of controlled vocabulary/thesarus that locates items on "senior citizens" when you search for "elderly people." Again, Magellan does not describe how this works, so you have to take their word for it. Literally.
|