The Invisible Web Revealed



Those Dark Hiding Places:
The Invisible Web Revealed Robert J. Lackie, Associate Professor-Librarian, Rider University"If only I had known!" was the bitter cry of the searcher who relied just on search engines to search the Web. Although many popular search engines boast about their ability to index information on the Web, more of it (dynamically-generated pages, certain file formats, and information held within numerous databases) has become invisible to their searching spiders. Much of the Web is hiding information from us, but we can access this hidden content! Learn how you can reveal the secrets of these dark, hiding places. Hidden Content on the Web

"The Web," according to Chris Sherman, Internet search expert and Associate Editor of SearchEngineWatch.com, "is increasingly moving away from being a collection of documents and becoming a multidimensional repository for sounds, images, audio, and other formats." Because much of this information is not accessible to many general search engines' software spiders, we need to look for specific search tools that will lead us to this hidden content. Some of these tools include directories, searchable sites, free Web databases, and a few general and many specialized search engines. Begin searching with...
Directories and Portals when you:
have a broad topic
want selected, evaluated, and annotated collections
prefer quality over quantity
Invisible or Deep Web [searchable sites and databases] when you:
are looking for information that is likely in a database
are looking for information that dynamically changes in content
Search engines [general and specialized] when you:
have a narrow topic
want to take advantage of the newer retrieval technologies

 

Directories

Directories are Web sites that provide a large collection of links, arranged according to a classification scheme that enables browsing by subject area. I really like directories, but what I want to point out right away is that I am not against using search engines. I consider directories to be complements to search engines, not their replacements. However, there is a trend developing toward the use of directories because, in addition to their classification, their content is pre-screened, evaluated, and annotated by humans. Sometimes, though, this annotation and classification process makes the information not as timely as it could be. This is usually true in very large directories, so look at several, large and small. Let's look at a few smaller, more selective directories that can also lead you to some of the Web's hidden content.
Librarians' Index to the Internet (http://lii.org/) - Information You Can Trust: A "searchable, annotated subject directory of more than 14,000 Internet resources," LII provides to librarians and the general public a "well-organized point of access for reliable, trustworthy, librarian-selected Internet resources." LII can also lead you to Invisible Web databases by typing in a broad topic and adding the words: "and databases" (i.e., biology and databases)

 
FindLaw (http://www.findlaw.com/) - "The world's most popular legal Web site," FindLaw is a "comprehensive set of legal resources on the Internet for lawyers, businesses, students and individuals." To find an annotated list of free databases on many law-related topics, from their main page under their "For Legal Professionals" section, click on the "Browse by Practice Area" link, pick a practice area/subject heading (i.e., "Health Law"), and then look for "Databases" under the Web Guide for that subject heading.

 
InfoMine (http://infomine.ucr.edu) - This scholarly resource collection includes more than 120,000 sites, grouped into 9 annotated, indexed categories (databases) for easy retrieval. This librarian-built "virtual library of Internet resources [is] relevant to faculty, students, and research staff at the university level," while also very useful for higher-level high school and professionals, too.

 
About.com (http://www.about.com/) - This portal neatly organizes thousands of topics, including Invisible Web, with good news and commentary. Try typing "Invisible Web" as a phrase in quotes to find many links to hidden content on the Web, including the "Invisible Web: The Cloaked Internet" and "Visible versus Invisible Web" from their Internet for Beginners guide. You will see links to other pertinent articles, too--all worth reading & exploring.



Invisible Web Searchable Sites

Chris Sherman states that "vast expanses of the Web are completely invisible to general purpose search engines," but there are ways "to find the hidden gems search engines can't see."

Four Recommended Links to Invisible Web Databases:
Direct Search (http://www.freepint.com/gary/direct.htm) - Gary Price, MLIS, of Gary Price Library & Internet Research Consulting, one of the foremost authorities on invaluable Invisible Web resources, has assembled a massive collection at this site. Visit one of Gary's newest endeavors, his ResourceShelf (http://www.resourceshelf.com/) Weblog & Newsletter, for "Search Engine, Info Industry, & Library News," among other items.

 
The Invisible Web Directory (http://www.invisible-web.net/) - Chris Sherman & Gary Price co-wrote The Invisible Web: Uncovering Information Sources Search Engines Can't See. Coming from these experts, I recommend their popular book and its accompanying site for locating some of the Web's hidden content.

 
Profusion (http://www.profusion.com) - From Intelliseek, who first gave us The InvisibleWeb.com directory, ProFusion is their updated meta-search engine that lets you select your own subject category of collected search engines and subject directories, allowing you to "target your search by drilling into one of [their] vertical search groups."

 
CompletePlanet (http://www.completeplanet.com/) - BrightPlanet Corporation's site contains "70,000+ searchable databases and specialty search engines." Check out their BrightPlanet FAQs (http://www.brightplanet.com/deepcontent/deep_web_faq.asp), too.

Some Invisible Web Databases



Although there are thousands of Invisible Web databases available to us for free on the Web, below I have listed a few of my favorites:
AnimalSearch (http://animalsearch.net/) - A database for family-safe animal-related sites, you can also search here by group, type, and geographic regions.

 
Educator's Reference Desk (http://www.eduref.org/) - This site began contains 2000+ lesson plans, 3000+ links to value-added online education information, and 200+ question archive collected on the award-winning AskERIC site during the past decade. This site also provides access to the ERIC database--the world's largest source of information on education research & practice, including free, full-text expert digest reports, and it also links you to the Gateway to Educational Materials (GEM), which "provides quick and easy access to over 40,000 educational resources found on various federal, state, university, non-profit and commercial Internet sites."

 
NatureServe Explorer (http://www.natureserve.org/explorer) - This online encyclopedia provides authoritative "information on more than 60,000 plants, animals, and ecosystems of the United States and Canada. Explorer includes particularly in-depth coverage for rare and endangered species."

 
Nuclear Explosions Database (http://www.ga.gov.au/oracle/nukexp_query.html) -  Geoscience Australia's database provides location, time, & size of explosions worldwide since 1945. Click on "databases" under "Online Tools" to see a list of other searchable online mapping tools & databases.

 
On-Line Encyclopedia of Integer Sequences  (http://www.research.att.com/~njas/sequences/) - "Type in a series of numbers and this database will complete the sequence and provide the sequence name, along with its mathematical formula, structure, references, and links."

 
PubMed (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi) - Provides access to 14 million+ MEDLINE citations, including links to full text articles & related resources. You will also want to explore PubMed Central (PMC), an e-archive of free, full text articles from 160+ life sciences journals, as well as Bookshelf, "a growing collection of [full text] biomedical books that can be searched directly." They now offer a "new global NCBI 'Entrez' search engine" where you can search across their many life sciences databases, too.

 
LookSmart's FindArticles (http://www.findarticles.com/) - LookSmart's FindArticles database is an updated replacement of their free, searchable article Web archive, with the current service now covering "5.5 million articles from over 900 publications," 1998 to present, making it the "Web's largest free articles database." LookSmart's new FindArticles still provides articles that can be read in their entirety and printed at no cost, with the new service accessible via an "Articles" tab on the main LookSmart.com page.
 
 
Directory of Open Access Journals (http://www.doaj.org/) - Launched in May 2003, Sweden's Lund University Libraries Head Office hosts this "one-stop shopping" open access directory, providing no-cost access to the full text of over 1,400 journals, with 350 journals searchable on the article level--in the science and humanities/social science--and its directory is continually growing in size.

By the way, if you like viewing accompanying Web sites from excellent books on Web research, you may also want to visit the Super Searchers Web Page (http://www.infotoday.com/supersearchers/), which "features a growing collection of links to subject-specific Web resources recommended by the world’s leading online searchers" in global business, primary research, mergers/acquisitions, news, writing, health/medicine, investment, business, entrepreneurial research, & legal information resources. The books and their Web sites can lead researchers to a wealth of hidden resources.



Search Engines

Some general and specialized search engines, like those listed below, can help you locate specific information or certain file formats, so I like to go to them first. I do use several search engines for research, but they are not all created equal when it comes to uncovering data in the Invisible Web domain. A great site for keeping up-to-date on search engines is Search Engine Watch (http://www.searchenginewatch.com/). Another great site on search engines is Search Engine Showdown (http://www.searchengineshowdown.com/). Let's explore these two sites and general & specialized search engines that allow us to find some Invisible Web data. Immediately below are a few interesting specialized search engine services/sites.
Singingfish (http://www.singingfish.com) - A superior audio/video search engine, Singingfish "only indexes multimedia formats, including Windows Media, Real, QuickTime, and mp3s." Their content is free, and you can search for both audio/video or just one type of media.

 
Google News (http://news.google.com/) - This award-winning automated (no Google editors) version scours the Web every 15 minutes, capturing news from 4,500+ sources. Recently, Google News added a new feature: a "Top Stories" drop-down menu that allows us to select the top news stories from several different countries. Note: Yahoo! News, Topix.net, and Daypop are also impressive news-aggregating services with special features, too.

 
Scirus (http://www.scirus.com/) - This science search engine, with over 167 million science-specific Web pages, offers excellent advanced search options for a wide variety of information types and sources of materials on the Web, including journals. Scirus has become pretty successful at pinpointing science-specific data, reports, articles, and relevant scholarly Web pages--a considerable recent improvement. Check out their Advanced Search page, as well as their Search Tips and About Us links, and see my Nov/Dec 2003 "Science and Math Web Resources for the Higher Education Community" article in ONLINE magazine dealing with Scirus and other sci-math search tools, available full text via EBSCO and ProQuest databases and on my directory page.

 
UFOSeek (http://www.ufoseek.com/) - "Yes, Mulder, the truth is really, um, out there, and you can find it using this paranormal/UFO search engine," currently indexing over 10,000 sites.

We know that information on some sites is presented in formats other than static HTML, which gives search engines a problem. Adobe Portable Document Format (PDF) has been an example of this. If HTML text that accompanies the PDF file describes the file well, you may find the site, but if the site provides unhelpful headings or titles, then the file is pretty much "invisible." This is also true for Flash files, for instance. Fortunately for us, a few general search engines are more easily bringing some PDF, Flash, and other non-HTML files to our desktops.
Google (http://www.google.com/) - Google is a search engine that indexes records based on their popularity; i.e., the degree to which other popular pages refer to a page. Google also allows you to go to the page as it is currently on the Web, or go to a cached copy Google stored when it retrieved the page (nice when the current page won't connect). In addition, Google allows you to find those Invisible Web documents: PDF files. You can also view them in ASCII-text (nice when you have a slow connection or the PDF is so large that you don't want to wait to display). From Google's Advanced Search, you will see that in addition to allowing you to limit your search to finding PDF files, you can limit or exclude other file formats, such as Postscript; Microsoft Word, Excel, or PowerPoint; & Rich Text formats. Check out their "Google Web Search Features" for other interesting items, and visit the Google Scholar (Beta) site (http://scholar.google.com/) to search for some "articles from a wide variety of academic publishers, professional societies, preprint repositories and universities, as well as scholarly articles available across the web."

 
Yahoo! Search (http://www.yahoo.com/) - Google's biggest competitor now that it dropped them as a partner, Yahoo! also provides cached copies and locates Word, Excel, PowerPoint and PDF files--and something Google doesn't--RSS/XML files. Yahoo! also has full Boolean searching capability after purchasing the AlltheWeb and AltaVista search engines, so it looks like Google is going to be keeping an eye on Yahoo!'s continued aggressive progress. Check out their interesting "Search Shortcuts," too.

 
Gigablast (http://www.gigablast.com/) - An interesting up-and-coming search engine, Gigablast also locates Word, Excel, PDF, and other non-HTML files, and like Google and Yahoo!, it provides cached (most recent "archived copy") of these files. It also links you to multiple "older copies" via The Internet Archive Wayback Machine. In addition, it also provides full Boolean searching, so keep an eye on Gigablast, too.

FYI: Below are a few of my recent articles on the invisible/hidden web (and other education-related topics) for your review; other articles/presentations can be found at my Robert J. Lackie's Selected Online Materials (http://library.rider.edu/moore/directory/rlackie/rlackieepub.html) page:
• Lackie, Robert J., and Robert J. Congleton. "Free and Fee-Based Online Science Resources for the K-12 Community." MultiMedia & Internet@Schools: The Media and Technology Specialists' Guide to Electronic Tools and Resources for K-12 11.6 (2004): 10-16. (HTML article--also available via EBSCO full text databases).

• Lackie, Robert J. "Finding Free Educational Resources for Teachers & Librarians (Track A: The Web: What's New Out There Now?)." Internet@Schools West 2004 National Conference. Monterey, CA. 15 Nov. 2004. (PowerPoint).
Accompanying proceedings article (PDF) available at above link, cited below:
Lackie, Robert J. "Internet@Schools: Finding Free Educational Resources for Teachers & Librarians." Internet Librarian 2004: Collected Presentations, Monterey, CA, 15-17 Nov. 2004. Comp. Carol Nixon and Jennifer Burmood. Medford: Information Today, 2004. 296-302.

• Lackie, Robert J. "The Evolving 'Invisible Web': Tried-and-True Methods and New Developments for Locating the Web's Hidden Content." College & Undergraduate Libraries 10.2 (2003): 65-71. (PDF article).

• Lackie, Robert J. "Science and Math Web Resources for the Higher Education Community." Online: The Leading Magazine for Information Professionals 27.6 (2003): 35-39. (PDF article available "here"--also available via EBSCO and ProQuest full text databases).
Send comments or questions about this workshop and/or Web site to Robert J. Lackie (rlackie@rider.edu), including if you would like permission to link to Those Dark Hiding Places: The Invisible Web Revealed (http://library.rider.edu/scholarly/rlackie/Invisible/Inv_Web.html) or if you are interested in attending sessions held in NJ, NY, or PA.

Posted: Fri - January 14, 2005 at 05:40 PM        


©