Previous | Top | Next | Search | Comments
about searching

About Searching

This is a description of the various information formats available through the Internet including: HTML pages, mailing lists, newsgroup postings, bibliographic citations, definitions, etc.

You've heard it a million times. "You can find anything on the Internet." Well, you also know this is overstatement. Furthermore, you know that finding things on the Internet is a lot like finding a needle in a haystack, or to turn the phrase, "drinking from a firehose."

On the other hand, if you know how to make Internet search engines work for you, then you might very well to be able to find what you are looking for on the Internet. The trick is knowing how to use the tools properly. In general, you should apply traditional online search techniques to your information needs. This means you must:

  1. articulate your information need as specifically as possible
  2. choose a search tool whose features accommodate the articulated need
  3. understand how to translate your need into terms the tool understands
  4. apply your search query
  5. evaluate the results
  6. repeat as necessary or until resources are expired


The definitions below describe the most common features implemented by the search engines in the next section. The cited items following the definitions describe in more detail the Internet searching process. Of particular interest are the items by Duda, the Internet Scout Project, Campbell, and Zorn.

Boolean logic
Articulated by George Boole, Boolean logic is a method of creating new sets of items by combining existing sets with logical union (or), intersection (and), and exclusion operations (not). The syntax for implementing these operations are different from service to service, but generally speaking the plus sign (+) or "and" will be used to specify intersection, a vertical bar (|) and "or" denotes union, and the minus sign (-) or "not" denotes exclusion. By default, most search engines treat multiple terms as phrase searches or with a logical "and" and not an "or."

"concept extraction"
Coined by Excite, Inc., "concept extraction" attempts to match search terms and phrases with other terms and phrases with similar meanings. Its a thesaurus.

date ranges
This feature allows you to specify two dates (days, months, or years) in your query and have the results contain only items whose "publication" is between those two dates.

Domains are the names given to sets of computers with Transmission Control Protocol/Internet Protocol (TCP/IP) configurations. Every computer implementing TCP/IP will be associated with an IP number and/or name. These names are domains such as

field searching
Databases, by definition, contain records, items with similar characteristics. These characteristics are generally implemented as fields. Typical fields in a database of bibliographic records include author, title, date, and pages. Field searching allows you to specify a query for items in one or more of these similar characteristics.

natural language
The Holy Grail of search features, natural language queries are queries allowing you to submit search strategies representing human speech. They usually work by removing the stop words from the input and then performing some sort of Boolean operation on the remaining set.

nested queries
Nested queries are queries allowing you to perform multiple set manipulations with one command-line. Nested queries allow searchers to override the default precedence of query operations like Boolean operations. They are almost always implemented with the use of parenthesis.

phrase searching
Phrase searching are queries where the input terms represent matches for consecutive strings of text. These strings of text can be case-sensitive or case-insensitive. Phrase searching is the default behavior of most Internet search engines.

Very similar to phrase searching, proximity searches allow you to specify the number of words are allowed to appear between two query terms. Proximity searches go one step beyond phrase searches since they allow you to specify a greater number of string combinations.

Ranges allow you to specify queries where two query terms can be represented in a linear fashion and result sets will contain items between the query terms.

regular expressions
Regular expressions are a symbolic means of describing strings of text. They are an abstract representation of words and terms based on positions with lines of text, casefulness, letters, digits, and symbols. Regular expressions have no usefulness when it comes to describing the meaning of words and phrases, only their "shape."

relevance ranking
Relevance ranking is a method of ordering lists of search results where items of the most statistical significance appear at the top and least significance appear at the bottom. A relevance ranking is a number calculated through a combination of number of term occurrences, position(s) in the located document, length of the located document, "weight" of the search terms, and number of term appearances in the entire database.

spelling options
My catalog is not necessarily your catalogue. Search engines that account for spelling differences process queries in such a way that multiple spellings of search terms can be retrieved. This is done through the use of regular expressions or thesaurus terms.

Words in Western languages are built upon shorter string of letters with discrete meanings. Through the use of truncation/stemming techniques, search queries can be constructed to locate items containing variations of the shorter strings, thus increasing recall. Truncation/stemming implementations vary from tool to tool, but are usually performed by concatenating dollar ($) or number (#) signs to the ends of search terms.

wild card
Wild card searches are a variation of truncation/stemming searches except queries are allowed to specify variations in spelling within terms and not necessarily at the end of terms.


Title: Surfing and Searching
Remote HTML
Description: This easy-to-read article from Fortune magazine outlines why it is important to not only to use Internet search engines, but to combine these tools with browsable lists. The article quotes Reva Basch who says, righlty so, that one good way to find information is to monitor USENET newsgroups for experts in a subject area and then query those experts directly. It just goes to show that people are the real sources of informaiton, not computers.

Author: Bates, Mary Ellen
Title: Seven Deadly Sins of Online Searching
Remote HTML
Cost: 0
Description: This article humorously defines seven things you should avoid when doing online searching: pride, haste, avarice, apathy, sloth, narrow-mindedness, and ignorance. These "sins" are put into the perspective of online searching and are intended to be kept in mind when doing traditional online searches as well as Internet-based seach strategies. Not only is this a quick and easy read. It makes a lot of sense too.

Author: Brandt, D. Scott
Title: Relevancy and searching the Internet.
Citation: Computers in Libraries ISSN:1041-7915 Sept 1996, v16, n8, p35(3)
Remote HTML
Description: Relevancy has long been important in the world of information gathering. It's vital in tasks such as locating the most relevant resources, using the most relevant retrieval methods, and making sure the information found is relevant to the need. Relevancy has many synonyms-applicability, correspondence, pertinence-and relates to many kinds of decisions we make in our lives. Because so many of these decisions require sifting through massive amounts of information, relevancy is an important factor in making the best decisions we can.

Author: Brooks, Monica
Title: Research on the Internet
Remote HTML
Description: In one (slightly lengthy) page, this Argus Clearninghouse approved site outlines how to do research on the Internet. It lists objectives, Web basics, compares and contrasts search engines, describes Boolean logic, OPACS, FTP, and important aspects of government information. This is a fine research aid, and a great starter page for students in academe.

Author: Campbell, Karen
Title: Understanding and Comparing Search Engines
Remote HTML
Description: This document lists reviews of Internet search engines. It is a good place to begin when you want to see what other people have said about the available tools.

Author: Clyman, John
Title: Finding your needle in the Web's haystack.
Citation: PC Magazine ISSN:0888-8507 July 1996, v15, n13, p39(3)
Remote HTML
Description: "Six popular World Wide Web search engines are reviewed. DEC's Alta Vista is the most powerful and comprehensive full-text search system, often generating five to ten times as many matching documents as other engines. The default Simple Search is easy to use, but most users will want to learn the Advanced Search mode, which supports Boolean operators and case-sensitive and proximity searches. Excite is one of the oldest search engines and has fallen behind competitors. Its quirky interface is annoying, and users cannot modify the original query from the results page. InfoSeek generally finds the desired information and has a good query-by-example function. Lycos combines full-text searching with two searchable indexes, but neither index is comprehensive. ... WebCrawler is adequate but not especially powerful. Yahoo! is purely a searchable index built by real people instead of automated agents. Users can search only for words in the category name or site summary."

This article, while just slightly dated, provides a more than adequate overview of the most popular Internet search engines.

Author: Community Networking
Title: Research Works
Remote HTML
Cost: 0
Description: Like AskScott, this service models it format around a librarian reference interview. It frames the services it provides in the form of questions:

  1. Are you looking for a particular research resource
  2. Can we suggest the best resources to approach
  3. Would you like some rraining in professional research
  4. Or perhaps our past projects, business plan and background will interest you!

Number 2, "Can we suggest the best Resources to Approach", is the most interesting since it presents you with a tiny form to complete and based on your input suggests tools for searching.

Personally, I think this sort of service represents a future opportunity for librarians to share their expertice with information seekers. There is no reason why the sort of service represented here could not be expanded and improved to become a more comprehensive service.

Author: Duda, Andrea L. editor
Title: Untangling the Web
Remote HTML
Description: This is a "Proceedings of the Conference Sponsored by the Librarians Association of the University of California, Santa Barbara and Friends of the UCSB Library" from April 26, 1996 University Center, University of California, Santa Barbara. The linked abstracts and papers describe all aspects of World Wide Web developement in a library setting. Of particluar interest are "Yahoo! Cataloging the Web" by Anne Callery, "Spinning a Web Search" by Mark Lager, and "Spiders and Worms and Crawlers, Oh My: Searching on the World Wide Web" by Ann Eagan and Laura Bender.

Author: Haskin, David
Title: Right Search Engine
Remote HTML
Description: "We tested six of the leading search engines: AltaVista, Excite HotBot, Infoseek, Lycos, and WebCrawler. We found that each can find an enormous amount of information, but a few are clearly superior in the way they home in on the most relevant information and in the interface they offer."

Author: Hearst, Marti A.
Title: Interfaces for searching the Web.
Citation: Scientific American ISSN:0036-8733 March 1997, v276, n3, p68(5)
Remote HTML
Description: "New user interfaces are being developed to help users find information on the Internet using an intuitive and explorative approach. This system places a topic within an information tree which users can follow toward the specific information they need."

This article outlines possibilities for spacially organizing information and search results for the purposes of better retrieval and analysis.

Author: Internet Scout Project
Title: Searching the Internet
Remote HTML
Cost: 0
Description: This is an excellent set of pages! It not only lists search engines and general as well as specific browsable lists, but it describes these things in greater detail than most of the other "About Internet Searching" pages listed in the present guide. This is one of the better places to start if you want a no-nonsense introduction to searching the Internet.

Author: Lynch, Clifford
Title: Searching the Internet.
Citation: Scientific American ISSN:0036-8733 March 1997, v276, n3, p52(5)
Remote HTML
Description: The increasing number of Web sites on the Internet will require changes in present day search engines to enable them to find the information that the user specifically requires. This may also involve changes in the way data or information is formatted for entry into the Internet.

Author: Mauldin, Michael L.
Title: Searching the World Wide Web; Lycos: design choices in an Internet search service.
Citation: IEEE Expert ISSN:0885-9000 Jan-Feb 1997, v12, n1, p8(7)
Remote HTML
Description: Lycos is a search engine that can be used for collecting, storing and retrieving information about pages on the World Wide Web. Lycos is based on the LongLegs program, and incorporates Pursuit retrieval engine and Lycos Catalog of the Internet. It uses a proprietary spider program written in C for foraging. Its search is based on popularity heuristic and is biased towards more popular and useful Web pages. The Pursuit retrieval program uses an inverted file containing document identifiers. Lycos simplifies the search for relevant information in the Web.

Author: Munson, Kurt I.
Title: World Wide Web indexes and hierarchical lists: finding tools for the Internet.
Citation: Computers in Libraries ISSN:1041-7915 June 1996, v16, n6, p54(4)
Description: Indexes and hierarchical lists are two types of search tools for locating information on the World Wide Web and other Internet resources. Indexes, such as Lycos and Open Text, provide access to records by matching search terms against descriptive cataloging. Hierarchical lists, such as Yahoo!, use descriptive and subject cataloging to group common resources by location.

Author: Notess, Greg
Title: Search Engines Showdown
Remote HTML
Description: "This site summarizes, reviews, and compares the search features and database scope of the Internet search engines and finding aids." It does this by dividing its content to the following nine parts: search engines, directories, multi-search, USENET & others, strategies, statistics, reviews, definitions, and bibliography. The bibliography alone is worth the price of admission.

Author: Notess, Greg R.
Title: Searching the hidden Internet
Citation: Database ISSN:0162-4105 June-July 1997, v20, n3, p37(4)
Remote HTML
Description: There are several sites on the Internet that cannot be found through the automated indexing of various sites. Some of this information can only be found in the PDF file. A new level of Internet databases and smart searching techniques will make these sites more accessible. Some sites on the World Wide Web require registration or a log-in procedure.

Author: Notess, Greg R.
Title: Searching the Web with Alta Vista
Remote HTML
Description: The following sentence from the article itself pretty well sums up its content:

"I compared a single, non-truncated keyword search on Alta Vista with the same search on the best known of the other search engines: Inktomi, InfoSeek, Open Text Index, Lycos, Excite, and WebCrawler. Searching on a fairly distinctive single word eliminates the disparity among the search engines in how they handle multiple word searches. In each of the five searches, the Alta Vista search resulted in a much higher number of hits. In fact, Alta Vista searches came up with two to six times the number of hits found by the second ranking search engine."

The article goes on to describe the strengths of AltaVista when compared to other search engines.

Author: Stix, Gary
Title: Finding pictures on the Web.
Citation: Scientific American ISSN:0036-8733 March 1997, v276, n3, p54(2)
Remote HTML
Description: "Search engines are under developed to increase their ability to find graphic information on the Internet. Current search engines rely on text captions to access graphic information. Future developments would incorporate the ability to compare various visual features, such as contrast, coarseness, directionality, shapes and color."

The article describes some future possibilities for locating graphics on the Internet.

Author: Tweney, Dylan
Title: Searching is my business: a gumshoe's guide to the Web.
Citation: PC World ISSN:0737-8939 Dec 1996, v14, n12, p182(8)
Remote HTML
Description: The Web can be a powerful research tool, but users must know what they are looking for and focus carefully to avoid wasting time. Techniques for maximizing Web productivity are presented. Web directories such as Magellan and Yahoo are fast, no-nonsense tools that point directly to useful sites but cover only a small fraction of all Web content. Search engines use automated 'spider' programs to locate information but tend to generate too many irrelevant matches if the user is not careful. Techniques for narrowing a search include being specific and adding Boolean operators. There are also 'meta' search tools on the market that organize and consolidate search results by sending queries to multiple search engines simultaneously. Numerous search assistants are available, but few are useful; three of the better ones are Knowledge Discovery's More Like This, Symantec's Internet FastFind and Quarterdeck's WebCompass 2.0. Offline browsers such as FreeLoader and First Floor's Smart Bookmarks save time and money.

Author: Yahoo!
Title: Searching the Web
Remote HTML
Cost: 0
Description: This site represents the most comprehensive collection in this guide. It includes pointers to hundreds of search engines, tutorials, and Internet directories. The collection is divided into many subject areas, including: Indices, All-in-One Search Pages, Comparing Search Engines, How to Search the Web, Indices to Web Documents, Regional Robots, Spiders, etc. Documentation, Search Engines, Web Directories.

Author: Zorn, Peggy; Emanoil, Mary; Marshall, Lucy; Panek, Mary
Title: Advanced web searching: tricks of the trade
Citation: ONLINE (WILTON, CONN), vol. 20, no. 3, 12ppp, 1996
Remote HTML
Description: "The purpose of this report is to look closely at several Web search systems that provide advanced search features and search a comprehensive and authoritative database of Internet sites. Based on these two key requirements, the Alta Vista, InfoSeek, Lycos, and Open Text are considered for evaluation. The search features looked for include complex Boolean, duplicate detection, keyword(s) in context, limiting retrieval by field, proximity and/or phrase searching, relevancy ranking of results, retrieval display options, search set manipulation, and truncation."

Previous | Top | Next | Search | Comments

Version: 1.0.2
Last updated: 4/15/00. See the release notes.
Author: Eric Lease Morgan (