Looking for Metadata in All the Wrong Places: Why a controlled vocabulary or thesaurus is in your future | WebReference

Looking for Metadata in All the Wrong Places: Why a controlled vocabulary or thesaurus is in your future


Looking for Metadata in All the Wrong Places

Why a controlled vocabulary or thesaurus is in your future

Andy King, editor of this fine newsletter, contacted me recently with what he thought was a reasonable question. Knowing me as an information architect with librarian roots, he wondered if I could help him find a basic vocabulary of Internet and technology- related terms. He was hoping to use these terms to better organize and label WebReference's content.

Andy had looked at both Library of Congress and Dewey Decimal subject headings (remember them from your library's card catalogs?) and found them wanting for his purposes. So he asked me if I could recommend a quick-and-dirty alternative.

What happens when you ask an information architect a simple question?

You get a long and complicated answer.

In each and every case.

Andy's question perfectly represents what's on so many webmasters' minds; where can I find an ideal set of descriptive terms that I can use to populate my site's information architecture (or, as some might call it, hierarchy or taxonomy)? For various reasons, which we'll cover below, the answer is simple: nowhere. There simply is no easy, shrink-wrapped, off-the-shelf solution.

Andy's reaction to my answer? "Eek!" Which is certainly reasonable. Andy and other webmasters face a big headache that unfortunately won't go away. Ever. But there are a few pain relievers on the market.

Why Andy Was Looking in the First Place

On the surface, Andy's needs seem simple. He's looking for what's called a controlled vocabulary (CV). A CV is simply a pre-determined list of standard terms used to describe a particular subject domain. A really simple example is the CV for colors that consists of the following six terms:

So it's "blue," and only "blue," not "navy" or "azure." "Green," not "mint." You get the picture.

We often use CVs to supply the terms that populate metadata fields for documents and database records. For example, LL Bean uses the term "blazers" to describe what others might term "jackets" or "sports jackets." LL Bean has made a decision to use "blazers" to consistently describe these items so that you, the user, don't need to look in more than one place to find them.

What if you were considering purchasing a sports jacket and didn't know to look for the term "blazers"? That's where a thesaurus can help. Think of a thesaurus as a CV on steroids. Besides a CV's preferred terms, a thesaurus can include variant terms (like "sports jackets"). The relationship between preferred and variant terms can be leveraged to improve searching: users who enter "sports jackets" would retrieve items indexed under "blazers." Thesauri can also include broader terms (e.g., "casual clothing"), narrower terms (e.g., "silk blazers"), related terms (e.g., "shirts"), and scope notes, which may define and provide context for the preferred term.

CVs and thesauri have huge value: not only can they improve how successful users are at searching and browsing, but they can also make it easier for content owners to manage their information. For example, tagging your records or documents becomes so much easier when you know that the preferred term to use is "cell phone" instead of "mobile phone." So it's no surprise that Andy was looking for a CV to help him manage WebReference's hi-tech content.


Created: February 1, 2001
Revised: February 2, 2001

URL: http://webreference.com/authoring/design/information/cv/