The greatest challenge in doing online research is finding quality, relevant material. There are many ways to find information on the Internet, but there are two sources that people access most often: directoriesand search engines. Sites that combine both a directory and a search engine are sometimes called hybrids. It is important to know the difference between these.
Directories are lists of links that are organized topically by human indexers. The staff of a directory read, evaluate and place a web page in appropriate categories. The main benefit of this is that it provides only relevant results when you look for something. The problems are that because of the human element, many relevant sites cannot be listed, and you must rely on the discretion and indexing job of someone who may be less expert than you.
The most popular directory is Yahoo! (www.yahoo.com), but there are many other directories as well. The Argus Clearinghouse (www.clearinghouse.net) offers a highly selective alternative. Web Sites must request to be indexed on Argus, and only 5-10% meet rather strict criteria. Because of its high standards, try Argus as a starting point for your academic research—whatever you find is almost guaranteed to be useful.
So which directory should you use? That is an impossible question to answer, as it depends on what you are searching for and personal preferences. One thing you should note is that there is substantial overlap in the sources of the listing in directories. As you check more and more directories, you will get a diminishing return on your time.
Table 2. Internet Directories
Independent, supplemented by Alta Vista.
Open Directory, supplemented by a spider.
Independent, supplemented by Inktomi.
Independent, Voluntary categorizers.
Independent, supplemented by Inktomi.
Independent, supplemented by Inktomi (see search engines).
Search Engines work on a different principle than the directories. Human involvement is minimal, as Search engines work with a utility called spiders. Spiders automatically access web pages, read them and then follow the links on the page to do it over and over again. How many of these links the spider follows depends on the search engine, as does the number of pages within the same web site that will be read. After reading the page and omitting the most common words (called stop words), such as the, of, a, the results are When you follow a link to a URL and get “File not Found” error, climb back up theURL by editing the address in your address bar. http://bob.com/truth/meaning/life.htm might be gone, but try http://bob.com/truth/meaning, if that fails, bob.com/truth/ and finally bob.com—there is a reasonable chance you will find what you are looking for.
placed in the search engines index. When you visit the search engine, it is this index you access.
Most search engines rank sites in some way, usually by the location and number of times the term you are searching for appears on the matching page. Other ranking techniques include counting the number of links on other web pages to the web page, or counting the number of times users of the search engine select the web page in question to visit. At two sites, Goto.com and AltaVista, web sites can pay a fee and receive higher ranking. Ranking is not a conscious evaluation of quality—spiders cannot discern the intellectual value of a site.
Choosing a search engine is more difficult than choosing a directory; not only are there more of them, but there is even greater overlap in the sources of their listings. If you have ever checked several search engines and felt like you were getting the same result, that is because you were. Below is a guide to the more useful search engines. All of these are, in fact, fine. A negative recommendation only means there are better choices for academic research.
Remember the Internet changes quickly, and while this information was correct at time of writing (May 1999), the situation may be different. To learn more about search engines, and to get the latest developments, visit Search Engine Watch at www.searchengine.com.
Hybrids include both a search engine and a directory, and are increasingly popular. As most search engines are going this direction, the issue need not be belabored here.
Metasearches allow the user to search multiple search engines simultaneously. Rather than using one search engine, then another, then another, search several simultaneously. Popular metasearch engines include MetaCrawler (www.go2net.com), SavvySearch (ww.savvysearch.com), and WebInfoSearch (www.webinfosearch.com) and Dogpile (www.dogpile.com). All of these are fine, with Dogpile having the edge in power, SavvySearch the edge in ease of use. Metasearches are not as helpful as one might think. It is probably best not to start here, but if you want to be absolutely sure you found everything, visit a metasearch.
Table 3. Favorite Search Utilities
Independent, Ask Jeeves
Alta-Vista has largest index and is the place to go for an exhaustive search. Other engines produce results in a slightly more usable manner, however.
To be sure you have everything.
Direct Hit, then Inktomi
HotBot will give you almost everything, but places the most popular sites first. Popularity is not everything, but it is an indicator. HotBot has easy to use filter options, and groups pages from the same site to reduce clutter—this alone make HotBot the most efficient to read.
The author usually starts here.
Northern Light has the second largest index. Northern Light clusters results in a usable format. Northern Light also has a “special collection”—magazine articles and other material you can buy online for reasonable prices. If your library is small, check here for help.
Sometimes. Because the index is small, you have to wade through fewer results, but you can also miss things.
GoTo returns responses for paid advertisers, then Inktomi results.
No. Unless you like advertising.
With a medium size index, Go will not find everything. What Go excels at is grouping and presenting the results in a lean, usable format. ‘Recommended Sites’ are designated such because they are financial partners.
Sometimes for quick results.
Direct Hit supplements other search engines by monitoring what choices users click on, and giving those choices top ranking.
The third largest index. Inktomi cannot be queried directly, but powers many search engines.
The Microsoft Network offers some excellent personalization choices and varied services all from one page. Some users like that, others find it too cluttered.
Independent, but run by Excite
WebCrawler prides itself on being small but thorough. Not the best for research, but if you are searching for something incredibly common, this might be a good choice.
Knowing where to search is not enough; the researcher must also know how to search. A successful Internet search requires both the knowledge of how to make the search engines work, and the cleverness to pick the search terms that will yield the most discrete results.
The “how” is usually a matter of search limiters. This may sound complicated and formidable, but is essential. Search engines are rather heavy-handed tools; experienced users will agree that the problem usually is one of getting far too many results. A successful search depends on limiting those results. The limiters tell the search engine how to deal with the many terms you might enter. This is important: if you do a search on the words george washington cherry tree, you need to know what the search engine is doing. Is it looking for any document containing any one of these words, or all of these words, or all of these words in this order, or what? Likewise, you can force the search to combine and limit words in very specific phrases to find what you want.
Search Engines tend to use limiters differently. In the following chart, you will find the most common and useful limiters and whether our favorite search utilities (HotBot, AltaVista and Northern Light, Yahoo!, Argus Clearinghouse) support them. [Remember, Yahoo! and Argus are not search engines and should not be used as such]. If you are using another search utility, you will have to check their instructions. The chart also indicates the symbols frequently used to incorporate symbols—not all limiters have symbols, but use them when they are available. There are many other search limiters that you might want to learn to use—the best method is to check the instructions of the search engine you like. “Default” in the chart means this is the limiter the services uses unless otherwise specified.
A successful search depends not just on knowing the technical details, but also a good sense of what words and phrases to search for. The basic rules are simple—three or four search words will allow you to get more specific results: george and washington can pull up most anything. George and washington and cherry and tree is more specific. The more particular the word, the less likely you are to get irrelevant results. Spend a few minutes thinking about what you want to find, and the most unique words that might be used in discussing it. Phrases are more powerful that a string of single words. George and washington can find all kinds of things unrelated to George Washington. But if you search for the specific phrase “George Washington,” you will get, for the most part, only the president himself.
As you will see in our example search below, it is important to modify the search as you go along to narrow your results. If, for example, you are finding a slew of irrelevant results from one specific web server, use limiters to exclude that server from the search. If you find other specific search terms that will help, add them in. If you find a common theme that is leading you to irrelevant results, find a word to exclude that will keep those sites from listing.
Proficiency Assignment 2
Pick a (very narrow) historical topic and do a complete Internet search on it. Compare the results from the favorite directories and search engines listed in this book, and explain why you got those results. Make a list of search words, phrases and limiters that proved most useful. Create a search routine that yields no more than 100 results. See if you can create search criteria that will return no irrelevant sites. Create a bibliography of all relevant books on the subject published in the last 10 years. At what stage in your search did you reach maximum benefit for the amount of labor involved?
A class may decide to all search the same subject—this allows students to get a sense of how complete their search was by comparing their results to others.
Table 5. Search Engine Limiters
When to Use
Returns pages that contain all the words
george AND washington (george + washington) returns pages that contain both the word george and the word washington
This is the most common use—select discrete words that will find what you want.
All. Default for HotBot, NorthernLight and Argus.
Returns pages that do not contain the word
george NOT washington (george - washington) returns pages that contain the word george but not the word washington.
Remove words that are leading you to irrelevant information.
All except Argus.
Returns pages that include one or the other or all search terms
george or washington returns all pages that contain george, or washington, as well as pages containing both words.
When there are different topics or words that will all reveal acceptable results.
All. Default for AltaVista and Yahoo.
Returns the precise phrase
“george washington” returns only pages containing the two words in this order and next to each other.
Allows high selectiveness.
Returns results only from the specified domain.
“george washington” host: edu returns only pages containing the phrase located on a server in educational domains. Exclude specific domains by using a NOT operator, eg. “george washington” - host: smithsonian.edu.
This can be used to check specific web sites or domain, and also can be used to filter out multiple entries from an unwanted source.
Only HotBot and AltaVista. For AltaVista use host:
Allows any character(s) to be substituted for the *
Use to find variations on uncommon terms or alternate spellings.
Recognizes only symbols (+, -, “”, *) unless the Advanced Search Page is selected.
Uses a drop-down menu to select limiters—use advanced search page for many useful limiters.
Recognizes only symbols.
As you already know, or will soon learn, no student of history can find everything on the Internet. The wealth of historical information resides still in libraries and archives. The Internet has, however, greatly increased the ability of students to use libriaries. Unless you are attending a major research university, your library is probably limited in size. All libraries vary in the quality and extent of their holdings. Sometimes you can supplement your library with on-line holdings. In general, however, only works old enough to be out of copyright will be found on-line. Do not look for the on-line version of the latest academic books or. Do, however, look for primary sources and less recent but still relevant articles.
Table 6: On-Line Libraries
Internet Public Library
An attempt to catalog what is already out there. Do not miss the journals and newspapers. For example, IPL can guide you to the on-line contents of the American Historical Review.
Literary texts and reference books. This is a good resource for primary sources.
E-Text, University of Virginia
Focuses on the Humanities. Also a good place to find journal indices.
Books do not necessarily have to be on-line for the Internet to benefit you in your research. At small libraries, just finding what books may be out there was until recently rather difficult. Traditional methods of checking periodical indices, book reviews, and book catalogs are so slow, frustrating and limited in their results that many students never quite get around to it. With the Internet, there is no excuse not to know about every important book written that may touch on your subject. Almost all major university libraries, as well as the Library of Congress, have their catalogs available via the WWW. A massive list of academic libraries can be found at Yahoo! (dir.yahoo.com/Reference/Libraries/Academic_Libraries/).
For history a search of the below library consortiums, University Libraries and the Library of Congress would be quite complete. Another good place to search, especially if you are concerned only with recent works, are virtual bookstores, as they maintain massive databases.
Table 7: Library Catalogs and Bookstores
Barnes and Noble
California Digital Library (UC shared collection)
Library of Congress
Ohio Link (colleges and universities in Ohio)
University of Michigan
Of course, knowing about the books is not the same as having them available to you—but it is a start. If you know of the work, you can search out book reviews, check other area libraries, or at least mark it as something you would like to read someday. Many college and universities have an Interlibrary loan office that may be able to obtain books from other libraries for you if you can provide them with the author, title and publication information.
To illustrate the search technique, we will work on a hypothetical assignment: “Write a 10 page essay on the story of George Washington and the Cherry Tree. Be sure to include a discussion of the origins and purposes of the tale.” This is a topic narrow enough that you will not find the answer quickly in the library without some further indications of where to look, so we will begin with the Internet. Maximize the benefit and so these searches yourself.
Stop One—Argus Clearinghouse. Maybe some academic has already made a page on this subject. But a check of their History directory shows no links to anything close.
Stop Two—Yahoo. Maybe someone, somewhere has already made a page on this subject. But a check of Social Science, History, U.S. History, 18th Century reveals nothing of use.
Stop Three—HotBot. We can be pretty sure that there are thousands of web pages that mention not just Washington and Cherries, but also many pages containing this very story. At HotBot we search for george washington (remember HotBot defaults to AND), but with 335,220 pages listed, it is clear the search must be limited. After all, george + washington would find a web page about George O. Jungles vacation in Washington—not much help to us. When the search is limited to the phrases “george washington” and “cherry tree,” only 700 pages are matched. This is a lot, but not so many we can not look and see how else we can limit the search. A quick look at the results reveals many of these pages are just a joke about the President Clinton. A new search is done: “george washington” “cherry tree” – clinton. This removes all the links containing the word “clinton,” and reduces the matches to 450. For a subject that is a very popular tale, that is not to bad, so it is time to see if any of these pages will help us.
At www.mountvernon.org we find a little quiz: George Washington chopped down a cherry tree—True or False. We select false and learn that the tale is a fabrication by someone named Mason Weems. Hardly enough to write our paper, but quite helpful. We return to HotBot and do a new search “george washington” “cherry tree” “Mason Weems.” Surprisingly, this yields no matches at all. Knowing that people often refer to authors and presidents by last name only, we try washington weems “cherry tree.” This yields 140 matches—a reasonable number to look at. Among other things we find that Weems was an Anglican minister who lived from 1794-1825 and wrote a largely fictional biography of Washington as an exhortation to honesty and virtue, an excerpt from his life of Washington, and a few other useful tidbits, including an image of the 1929 Grant Wood painting Parson Do Not Stop Here! Instructors are increasingly frustrated by students who perform only a cursory check of material on the Internet. If you have not tried at least 3 or 4 search methods, you have not really tried. It is easy to find the tidbits—but the really useful stuff takes some time.
Weems’ Fable. There is also a lot of repetition and junk—essays on conspiracies, rantings against Christians and so on that we can just ignore. A search just on “Mason Weems,” as he is clearly going to be important in the paper, yields nothing additional. This is a very useful start, but not enough to write the paper—so more research is called for.
Stop Four—On-Line Libraries. We know we must deal with Mason Weems’ Life of Washington, so we check at the Internet Public Library and other on-line libraries. Unfortunately, we cannot find it.
Stop Five—NorthernLight. NorthernLight has a larger index than HotBot, and some additional resources, so we need to check here. Here we search for weemswashington (NorthernLight defaults to AND). Let’s omit “cherry tree” as we do not want to eliminate useful information about Weems’ mythology that is not directly about the cherry tree. The search yields 3,990 results. We will scroll through the first hundred just in case. In addition to a few useful tidbits of information, we find an on-line copy of Weems’ book at xroads.virginia.edu/~cap/GW/weems.html! From this we follow a link to a web page, The Apotheosis of George Washington(xroads.virginia.edu/~cap/GW/
gwmain.html), that contains a wealth of useful information, and a bibliography current through 1991. While we are at NorthernLight we check the Special Collection folder that appears on the left side of the screen. There are no articles that are worth purchasing, but we do pick up a few references to recent books. We could check AltaVista, but at this point we have enough information to proceed, so there is no need.
Stop Six—Libraries. By this point we have an idea about where to go with this paper—we want to compare the known actions of Washington with the fabricated actions and see if we can detect a difference that will tell why Weems made things up. We already have a bibliography courtesy of the web site on The Apotheosis of George Washington, and all we need to do is update it through the on-line library catalogs. Our search at the library of Congress reveals a new edition of Weems’ work, with a commentary—that might prove useful. With list in hand, we head off to our library, with our research well underway.
Just finding the information on the Internet is not sufficient—the material must also be evaluated. This is a far greater concern than with books you check out at the library. When something is published with a major publishing house it usually goes through at least a minimal review process. Academic presses employ a rigorous review process. These reviews check for accuracy of material, legitimacy of sources cited, rationality of conclusion and so forth. With limited budgets, libraries tend to purchase only what they think are quality books. These factors combine as a type of safeguard: when you check something out from the library, chances are it is at least not horribly unreliable. If it is a well-documented work from a University press, you need not have many worries at all.
As you know, the Internet does not work that way. Anyone, for just a few dollars, can make a web site saying anything they would like. Sometimes there are out-and-out fabrications, especially on political and religious issues. People can, and do, make up evidence, as well as deliberately misquote and distort. But as advanced students, you will not have much trouble detecting these people—their strongly vocalized agenda will reveal them.
The more insidious problem is the plethora of self-proclaimed experts and well-meaning “amateurs.” Their goal is certainly not to deceive, but that does not mean that they are propagating correct information. With no formal review process, it becomes your job to evaluate the material. There is a reasonably good example of this problem in our George Washington research. Our very first HotBot search listed a web page entitled “The REAL Story of George Washington and the ‘Cherry Tree’” (www.edsanders.com/hist001.htm). The document begins: “What follows is the true story of Washington and the "cherry tree." There was no cherry tree. . . Read on to discover what really happened.” What follows is a story about, among other things, a young Washington accidentally killing his mother’s favorite colt while trying to break it.
You could just take this story at face value, cite the web page, and incorporate it into your paper. Instructors see such uncritical use fairly frequently, and it nevers helps the students’ grades (or learning). A healthy dose of skepticism is called for—why should I believe this anymore than the cherry tree story. As historians, first you will probably ask, “what is the source for this story.” The web page does not say, nor is it really clear who made the web page, and why. By following some of the internal links, you can find out that the page is made by a private individual who is also interested in selling you some products. That does not, of course, invalidate the information, but the author does not speak with the same authority as someone who, for example, has spent 20 years researching Washington. But no where on the site can we find the source of this information. That should be a big warning sign, as history is a discipline dependent on documented sources—that should be enough to bring the realization that this cannot be used in your paper.
If you search the author’s history pages long enough you can find this statement: “From time to time I will be posting interesting historical accounts you probably won't find anywhere else. Most are from books printed in the mid 1800's before "history" was watered down and re-written.” That’s another good clue, as the colt-breaking story reads just like the cherry-tree story—more about morality than history. If you were not convinced before, this certainly should indicate that this Washington story is not going to help your research. Just to be sure, we can use one of the author’s favorite tricks, and see if we can identify the source. As you know, you can search for phrases on the Internet. If the story is taken verbatim from a source that appears elsewhere on the Internet, we probably can find it. [Note that this also makes things plagiarized from the Internet pretty easy to detect]. Punctuation sometimes confuses search engines, so we will pick a short but unique text string. The first few strings tried failed, but this phrase, “Washington's father died when George was only eleven years old”, got a match in AltaVista. On another page, “Washington’s Rules of Conduct” (www.eagleforum.org/educate/washington/conduct.htm) we find that this story is taken word from word from one of two late nineteenth century readers for children—it is anything but the “true story.”
While there can be no hard-and-fast rules for evaluating Internet sources, there are some general guidelines that should be heeded, however.
What are you looking for? For example, are you seeking documentable historical facts, or opinions people hold about history. A Web page might provide one horribly and the other wonderfully.
Why did the person or organization post the material? Intent is important—an organization whose sole goal is to forward an idea or agenda will not have the same concerns as someone who seeks to present many different sides of a story. Some people will have a vested interest in what they are writing about, others will not. You would not want, for example, to use Exxon.com or greenpeace.org uncritically as sources in the study of oil spills.
Does the web site have any quality control? Organizations and companies control very closely what appears on their web sites. Educational institutions tend not to monitor as closely, but researchers are usually monitored by their peers. An historian’s page on history is probably okay; an historian’s page on increasing your gas mileage might not be so reliable.
Is the web site rational and well-argued? Differentiate between unsupported claims and statements, and claims and statements that are bolstered by reason and evidence.
Do the authors have appropriate qualifications? Authority and academic degrees do not make a web site accurate—but they help. Ask if the person writing seems sufficiently trained to make the claims being presented. Do they know the requisite languages? Have they studied the primary sources?
Does the site look professional and reliable? This does not mean the graphics good and the colors must be pleasing, but rather that the words are spelled correctly, and the grammar is acceptable. A good rule of thumb is that people who do sloppy presentations also do sloppy research. But do not hold too tight to that, as you may encounter a web site in the middle of being created.
Is this part of a current “hot” topic? Unreliable web sites are much more likely to be found about issues that are current points of debate. Someone might have a reason to distort evidence about oil spills. No one is going to fabricate a list of Latin synonyms for “pine tree.”
Can the material be confirmed by an external source? Can you find the information elsewhere, perhaps at the library? Other Internet sites saying the same thing may just be clones that have no ability to verify.
Is the site objective? This does not mean the site might not have a strong point to make, but it does mean that the site acknowledges that other points of view do exist, and that issues are rarely black-and white.
Proficiency Assignment 3
Web Site Evaluation
Your task is to find something akin to a objective evidence. The topic is the Shroud of Turin—an artifact whose origins are being debated, at least in theory, along scientific and scholarly terms. Below are the URL’s of two web sites about the Shroud. Do not be surprised if you become frustrated and confused about what is accurate and what is not—it is confusing!
Here is something to make things even more complicated—you will have to follow links from the two chosen sites to other locations. Be sure to answer questions 1 and 2 for all these locations. You will certainly want to visit (www.shroud.com/papers.htm) for additional material.
It might be easiest if you make a chart similar to that below for all the sites you visit. Do not forget how to do a search limited to domain. For example you might want to search for + blood + host:shroud.com to speed your work.—it would take many, many hours to search manually.
Schwortz, B. M. “The Shroud of Turin.” [http://www.shroud.com/]. 1999.
Shafersman, S. D. “The Skeptical Shroud of Turin Website.” [http://humanist.net/skeptical
Who is responsible for the website, and what are their qualifications?
Does the site have any supervisory authority? If so, who makes up that authority and what their qualifications?
Does the site work on the basis of claims, supported arguments, or both?
What does the C14 evidence suggest? Where were the first C14 results published? Is the journal peer-reviewed? What objections have been raised? Are those objections taken seriously by authorities?
Is there blood on the shroud? What is the evidence for and against? Which answer is more convincing, and why?
Does the image realistic represent a bleeding body? What is the forensic evidence? Which answer is more convincing, and why?
What does the pollen evidence? What objections have been raised? Are those objections taken seriously by authorities?
Are there coins and other objects represented in the image? What are the arguments for and against?
Which web sites, if any, can be called unbiased and objective?
Which web sites are useful?
Is there enough information available on the Internet to reach a conclusion?
As in any research, it is important that you cite your sources, even if they are from the Internet. Citation methods vary, and complete consensus on how to cite the Internet has not yet been reached. The basic idea is simple—record the information necessary to identify author, work and where to find it. If there is a standard for History, it is probably this variation on the APA style. If you wish more detailed information, visit Bibliographic Formats for Citing Electronic Information (www.uvm.edu/~ncrane/
For Bibliographic Entries:
Author's Last Name, First Name [author's e-mail]. "Title of Work" or "title line of message." In “Title of list/site” as appropriate. [URL]. Date (of last revision).
For Footnotes and Endnotes
Author's First name and Last name, [author's e-mail], "Title of Work" or "title line of message," in "Title of list/site" as appropriate, [URL], date (of last revision).
For a WWW page:
Sanders, Ed [firstname.lastname@example.org]. “The REAL Story of George Washington and the ‘Cherry Tree.’” In “EdSanders.com.” [http://www.edsanders.com
Footnote or Endnote Entry
Ed Sanders, [email@example.com], “The REAL Story of George Washington and the ‘Cherry Tree,’” in “EdSanders.com.” [http://www.edsanders.com/
For a Discussion List:
Watro, Lonny J. [firstname.lastname@example.org]. “Re: Washington & Weems & fort.” In “VA-Hist.” [email@example.com]. 7 Jan 1999.
Footnote or Endnote Entry
Lonny J. Watro, [firstname.lastname@example.org], “Re: Washington & Weems & fort,” in “VA-Hist,” [email@example.com], 7 Jan 1999.
For an E-Mail Message:
Weems, Mason. [firstname.lastname@example.org]. "Why I Wrote About George Washington." Private e-mail message to Grant Wood, [email@example.com]. 6 June 1929.
Footnote or Endnote Entry
Proficiency Assignment 4
Take the material collected for your Web Search assignment and convert it into a bibliography following the format guidelines presented here. Create an example “endnote” page that includes an example of each entry in the bibliography.
Mason Weems, [firstname.lastname@example.org], "Why I Wrote About George Washington," private e-mail message to Grant Wood, [email@example.com], 6 June 1929.