HTML Unleashed PRE. Strategies for Indexing and Search Engines: How to Design for Search Engines | WebReference

HTML Unleashed PRE. Strategies for Indexing and Search Engines: How to Design for Search Engines


HTML Unleashed PRE: Strategies for Indexing and Search Engines

How to Design for Search Engines


If you've read the previous chapter, you might have noticed the similarity between search spiders and people with disabilities: Both have no access except to the text-only content of web pages.  Therefore, most of the HTML authoring recommendations from the chapter on disabilities apply to the search-friendly design as well.

Providing text-only alternatives for every piece of information on your page is an obvious requirement because spiders only scan plain text (although, unfortunately, not all of them index alt texts of images).  Making your content fully comprehensible in text-only modality may be difficult (it's like trying to persuade somebody not in person but with a letter, without the powerful "multimedia" of gestures and facial expressions), but it's really rewarding in the long run.

Preserving the logical flow of the text, rather than sacrificing it for the sake of layout tricks, is also very important.  This improves the chances of spiders extracting a better summary for your document, as well as makes the text more suitable for automatic processing or categorizing.

Similarly, logical markup is an important requirement if you care about someone being able to use your document in any way, not just read it in a graphical browser.  Besides the spiders of the major search engines, a great number of various robots and indexers wander along the roads of the Web, and many of them rely on the logical tags, such as H1, for figuring out the structure of your data.


Keyword Strategies


All searches on the Web are being done via keywords, so it is probably the most important requirement to make sure that your documents contain all the keywords that are likely to be used to find the document.  Two distinct strategies can be outlined in this respect.

  1. The first idea that comes to mind is simple: The more keywords you cram into a page, the better.  Indeed, you can never predict what particular keywords will come to users' minds, so it's always a good idea to think about all possible synonyms, variants, generic inclusive terms, subterms, and related concepts for all the main subjects of your discourse.

    Besides, remember that the keywords can be entered in a different grammatical form, such as plural instead of singular for nouns.  And of the major search engines, only Alta Vista provides the "wildcard" notation to look for "table" or "tables" by specifying "table*".  So, you'd better see to it yourself by including both forms in your document.  (This problem is especially serious for languages other than English; for example, a verb in Russian may have up to 235 distinct forms.  Therefore, most Russian search engines, such as Aport mentioned earlier, by default employ word inflection algorithms allowing to automatically match all word forms.)

    Finally, if your main keyword is a relatively common word (such as "search"), it is likely that practiced search users will employ the phrase searching feature to query for word combinations (such as "search engines") rather than single words.  Therefore, make sure that your document contains the most common collocations of the main keyword with closely related nouns, adjectives, verbs, and so on.

  2. However, one might think about an opposite to the strategy of maximizing "keyword coverage" just described.  Remember that one of the factors in results ranking, as implemented by major search engines, is frequency, which is computed as the number of keyword occurrences divided by the document size.

    One consequence of this calculation is that if two documents contain the same keyword (located at the same distance from the top of document), the one that is smaller in size will get a higher ranking.  This gives you a clue: Select one of the root (introductory) pages on your site and try to make it as compact and concise as possible, so that it presents just the essence of your content with only the most common keywords.  This page will get a boost with respect to searches for these keywords, thereby attracting more hits to the entire site.

Thus, the best you can do is combine these two approaches by setting up both sorts of pages on your site: those with maximum keywords coverage and those with maximum relevance with respect to main keywords.

By the way, these two keyword strategies correspond to the two types of search queries, specific and general searches.  Some search engine users are looking for very specific information; they use rare keywords, phrase searches, and various advanced features such as Boolean operators.  It's these "power users" that your keyword-rich pages should appeal to.

Other users, however, just need to find a good resource covering some fairly general topic; they enter a couple of simple keywords, get an avalanche of results, and browse the first several links found.  For such general searches, web directories (such as Yahoo) usually perform better than search engines; however, a lot of users still employ search engines for the task.  The relevance boosting technique described above could be useful in attracting such users to your site.

You might be interested to see what keywords are entered most frequently by search engine users, to better align your keyword spectrum with the public preferences.  Unfortunately, this information (which would be immensely interesting from other viewpoints as well) is considered top-secret by major search engines---they never reveal their "top ten search words" lists for the (rather well-grounded) fear of spamming.

WebCrawler allows only a peek at the flow of search queries in real time, as they're entered on the search page.  However, minor search engines are usually less obsessed with confidentiality, and some of them show their search statistics (for example, a Russian search engine called Rambler presents its list of the top 100 search words).

The final piece of advice concerning keywords is rather obvious: Always check your spelling.  Spiders, in contrast to human readers, cannot "overlook" spelling errors, and you risk missing a good share of your potential audience by misspelling some important keyword.  It is especially relevant given that in most cases you add your keywords into a META tag after the document itself is written, edited, and probably spell-checked.


Created: Sept. 19, 1997
Revised: Sept. 19, 1997