HTML Unleashed PRE. Strategies for Indexing and Search Engines: The Meta Controversy | WebReference

HTML Unleashed PRE. Strategies for Indexing and Search Engines: The Meta Controversy


HTML Unleashed PRE: Strategies for Indexing and Search Engines


The Meta Controversy


One of the major search engines, Excite, ignores any information in META tags, and does so on purpose.  What could be the rationale for such a decision?

The reason stated on Excite's "Getting Listed" page is that META tags can be used by spammers to improve their rankings in an unfair way.  For Excite, attempting to make a page appear any different to search spiders than to human users is an unfair practice.  Indeed, nobody can guarantee that the keywords you enter are those describing your content, and in principle, you can easily use popular keywords to inflate your hits without any improvement of the page content.

At a first glance, this position might seem logical.  But is it? Remember that I can easily put any number of "hot" keywords onto the page itself, and if I don't want to distract readers with this promotion machinery, I can make them invisible by painting them with the background color (as many spammers do already, simply because META tags don't allow them to enter too many keywords).  After all, spiders will always index what I want them to, and banning one of the weapons can only ginger up the armaments race.

Excite's policy is based on the assumption that each page has its intrinsic "value," and that this value is evident from reading the text on the page.  If this is true, then it's natural to require that spiders, to be able to assign a fair "relevance" value, would get exactly the same text as human readers.  But it is also silently assumed here that a spider can read, understand, and evaluate the text just as humans do.  This is where the main fallacy of this approach lies.

The main purpose of a META tag is to provide some information about the document, and the tag does it mostly for computers that cannot deduce this information from the document itself.  Keywords and descriptions, for example, are supposed to present the main concepts and subjects of the text, and no computer program can yet compile a meaningful summary or list of keywords for a given document.  (In this context, it's interesting to note that Excite is the only search engine to employ an artificial intelligence algorithm for compiling summaries based on the document text.)

True, the META mechanism is open to abuse, but so far it's the only technique capable of helping computers better understand human-produced documents.  We won't have another choice but to rely on some sort of META information until computers achieve a level of intelligence comparable to that of human beings.

In view of this, it is interesting to discuss the latest development in the field of meta-information, the Meta Content Framework (MCF).  This language is used for describing meta-information properties, connections, and interrelations of documents, sites, channels, subject categories, and other information objects.  MCF was developed by Netscape and has been submitted as a draft standard to W3 Consortium.

MCF may be useful for maintainers of closed information systems, such as intranets, corporate and scientific databases, and so on.  Its main promise, however, is the capability to build a meta-information network for the entire Web.  Unfortunately, given the controversial position of the rather primitive META tags of today, it is not very likely that the sophisticated tools of MCF, even if approved by W3C, will gain any widespread recognition.


Created: Sept. 19, 1997
Revised: Sept. 19, 1997