010201.html | WebReference


((((((((((((((((( WEBREFERENCE UPDATE NEWSLETTER ))))))))))))))))) February 1, 2001


Sponsored by: WebTrends, Accenture and Informative Graphics __________________________________________________________________


Get powerful insight into the performance of your web site! Test drive WebTrends Enterprise 5.0 -- the industry's most comprehensive web site management & analysis solution. Now with the power to process up to 1 billion hits per day. Report on traffic analysis, broken link analysis, streaming media usage, monitoring & alerting -- all from one solution. Click here to download your free trial: http://www.webtrends.com/redirect/internet_news1.htm


http://www.webreference.com *- link to us today http://www.webreference.com/new/ *- newsletter home http://www.webreference.com/new/submit.html *- submit article

This week we are fortunate to have the info architect himself, Louis Rosenfeld, president of Argus Associates, explain to us how to classify the information on your site using controlled vocabularies (CV). As the Web doubles every few months, metadata is becoming one of the keys for what Tim Berners-Lee calls the "Semantic Web."

Finding things on the Web using the classic brute-force approach of indexing everything has its limits. Metadata, or information about information, gives content context, and makes finding and organizing content easier. CVs, thesauri, and metadata containers like RDF and the Dublin Core can make finding and organizing things on the Internet easier.

New this week on WebReference.com and the Web:

1. TWO NEW CONTESTS: Submit & Win NetObjects Fusion 5!, Signup & Win! 2. FEATURED ARTICLE: Looking for Metadata in All the Wrong Places 3. NET NEWS: * Discounting the Internet * Mazda's build-to-order roadster online * Forget fiber optics - wireless is way to go for Internet access * Commentary: Two-Way Internet Desktop Web Sites On The Horizon

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. TWO GREAT CONTESTS: Submit & Win NetObjects Fusion 5!, Signup & Win

>Submit & Win NetObjects Fusion 5!

Submit your article today and you could win NetObjects Fusion 5! If your article makes the cut, and we publish it on the site or in this newsletter, you win! See the submission page for details:


>Signup & Win!

Sign up for the WebReference Update newsletter, and you could win a killer software bundle from Ulead Systems, PhotoImpact and COOL 3D. Each week we'll draw new winners from our new subscribers - you could be next. Already a subscriber? Not a problem - just fill out the form, and you'll be automatically entered to win. Tell your friends!


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2. FEATURED ARTICLE: Looking for Metadata in All the Wrong Places: Why a controlled vocabulary or thesaurus is in your future

Andy King, editor of this fine newsletter, contacted me recently with what he thought was a reasonable question. Knowing me as an information architect with librarian roots, he wondered if I could help him find a basic vocabulary of Internet and technology- related terms. He was hoping to use these terms to better organize and label WebReferenceÂ’s content.

Andy had looked at both Library of Congress and Dewey Decimal subject headings (remember them from your libraryÂ’s card catalogs?) and found them wanting for his purposes. So he asked me if I could recommend a quick-and-dirty alternative.

What happens when you ask an information architect a simple question?

You get a long and complicated answer.

In each and every case.

AndyÂ’s question perfectly represents whatÂ’s on so many webmastersÂ’ minds; where can I find an ideal set of descriptive terms that I can use to populate my siteÂ’s information architecture (or, as some might call it, hierarchy or taxonomy)? For various reasons, which weÂ’ll cover below, the answer is simple: nowhere. There simply is no easy, shrink-wrapped, off-the-shelf solution.

AndyÂ’s reaction to my answer? "Eek!" Which is certainly reasonable. Andy and other webmasters face a big headache that unfortunately wonÂ’t go away. Ever. But there are a few pain relievers on the market.

>Why Andy Was Looking in the First Place

On the surface, AndyÂ’s needs seem simple. HeÂ’s looking for whatÂ’s called a controlled vocabulary (CV). A CV is simply a pre-determined list of standard terms used to describe a particular subject domain. A really simple example is the CV for colors that consists of the following six terms:

Black Blue Green Red White Yellow

So itÂ’s "blue," and only "blue," not "navy" or "azure." "Green," not "mint." You get the picture.

We often use CVs to supply the terms that populate metadata fields for documents and database records. For example, LL Bean uses the term "blazers" to describe what others might term "jackets" or "sports jackets." LL Bean has made a decision to use "blazers" to consistently describe these items so that you, the user, donÂ’t need to look in more than one place to find them.

What if you were considering purchasing a sports jacket and didnÂ’t know to look for the term "blazers"? ThatÂ’s where a thesaurus can help. Think of a thesaurus as a CV on steroids. Besides a CVÂ’s preferred terms, a thesaurus can include variant terms (like "sports jackets"). The relationship between preferred and variant terms can be leveraged to improve searching: users who enter "sports jackets" would retrieve items indexed under "blazers." Thesauri can also include broader terms (e.g., "casual clothing"), narrower terms (e.g., "silk blazers"), related terms (e.g., "shirts"), and scope notes, which may define and provide context for the preferred term.

CVs and thesauri have huge value: not only can they improve how successful users are at searching and browsing, but they can also make it easier for content owners to manage their information. For example, tagging your records or documents becomes so much easier when you know that the preferred term to use is "cell phone" instead of "mobile phone." So itÂ’s no surprise that Andy was looking for a CV to help him manage WebReferenceÂ’s hi-tech content.

>Why Andy Is Destined to Fail, or, at least, Stumble Around a Bit

There are CVs and thesauri that have already been created from which you can learn and borrow (with appropriate permissions, of course). Here are just a few good starting points for finding CVs, thesauri, and other classification schemes:

http://web.simmons.edu/~schwartz/myalpha.html http://www.darmstadt.gmd.de/~lutes/thesauri.html http://www.asindexing.org/thesonet.shtml

But why am I so pessimistic about AndyÂ’s chances? Because itÂ’ll be nearly impossible to find just the right CV or thesaurus that will meet all his needs.

The problem is two-fold:

First, there are a zillion subject domains out there. If youÂ’re looking to create the worldÂ’s leading e-commerce site for fly-fishing enthusiasts, itÂ’s unlikely that youÂ’ll find a decent, if any, thesaurus or CV on the topic. So youÂ’ll have to create one of your own. And thatÂ’s not an easy task. After all, if it were, thereÂ’d be a lot more thesauri and CVs available. My company offers a full-day seminar on the topic (http://argus-acia.com/acia_event/seminar_roadshow.html), which covers only the tip of the iceberg.

Second, just because a CV or thesaurus on your topic exists, it isnÂ’t necessarily appropriate for your needs. Your users, content, and context will invariably be different from those for which the CV or thesaurus was originally created.

LetÂ’s say that Andy actually finds a thesaurus of Internet and technology-related terms. But as he takes a closer look, he might find that the terms are very technical (e.g., "input device"), when his siteÂ’s users might be more likely to use laypersonsÂ’ terms (e.g., "keyboard"). He might learn that the thesaurusÂ’ terms were built for a very small collection of documents, and are therefore not sufficiently numerous or specific for his large body of content. Or he might discover that the thesaurus was designed for a slightly different context, such as to support automatically expanding usersÂ’ search queries, which might make it harder for Andy to use the thesaurus to build his siteÂ’s browsing taxonomy.


ANDERSEN CONSULTING CHANGES NAME AND EXTENDS CAPABILITIES At Accenture, we've done more than change our name. We're reinventing ourself to become the market maker, architect and builder of the new economy. Stop by our Career Webcast premiering on Feb. 1 for information on our new name and new opportunities. It's a career opportunity you won't want to miss. Register at: http://www.iian.ibeam.com/ac.jsp?c=webcast&n=nWDrWRU&t=email


>What Andy Can Do Now

So, there are no silver bullets. But there is hope! Andy has some options:

1) Borrow an existing CV or thesaurus. This is a decent option, if there is an appropriate one out there, and if permissions donÂ’t get in the way. As Andy customizes a CV or thesaurus, itÂ’s not clear when it will become his own as opposed to remaining the property of the original author. This is a sticky intellectual property issue to be aware and beware of.

2) Create a CV or thesaurus from scratch. This is a large and expensive task, and heÂ’d better plan on a few monthsÂ’ work. And remember, itÂ’s really, really hard to get it "right" all at once: because users, content and context change all the time, CVs and thesauri should as well.

3) Grow a CV or thesaurus over time. This might be the most practical approach. Andy should take a little time and learn about whatÂ’s involved in developing a CV or thesaurus, and then begin an iterative process of starting with something small and basic, expanding and improving it over time. Better to crawl before walking than never to walk at all.

Whatever approach Andy takes, itÂ’s crucial to explore the three issues of:

Users - Who will use it? Content - What will it be used to describe? Context - Where and how will it be used?

HeÂ’ll need to analyze the language users employ when they search or browse; one good technique is to cluster search log results. HeÂ’ll want to do some similar analysis of WebReferenceÂ’s content; a content inventory is a good place to start. Examining any existing indexing would also be a good idea. And heÂ’d better make sure he understands his context; in this case, how much time and budget are available, and how the workflow of applying terms and maintaining the CV or thesaurus over time will work.

A great big challenge? Sure. But using CVs and thesauri are a wonderful way to improve the way your site works for both users and maintainers alike. Not to mention a great big opportunity, at least for us consultants....

>Further Reading

Unfortunately there isnÂ’t much out there for the layperson who wants to learn about CV and thesaurus design. If you are willing to scratch the surface, you canÂ’t beat "Thesaurus Construction and Use: A Practical Manual" (3rd Edition) by Jean Aitchison, Alan Gilchrist, and David Bawden, as a general introduction. And though it has nothing to do with creating CVs and thesauri, Bill BrysonÂ’s highly entertaining "The Mother Tongue: English & How It Got That Way" is great food for thinking about language and meaning; itÂ’s wonderful prep for delving into the guts of thesauri and CVs.

YouÂ’ll find more reading materials available from our seminarÂ’s reading list (http://argus-acia.com/seminars/seminar_resources.html).

About the author: Louis Rosenfeld is president of Argus Associates (http://argus-inc.com), a consulting firm that specializes in providing information architecture services to Fortune 500 clients. He is the author of "Information Architecture for the World Wide Web" and can be reached at: mailto:lou@argus-inc.com.

Here are some relevant links at WebReference:

Information Design Resources: http://webreference.com/authoring/design/information/

Interview: Lou Rosenfeld http://www.webreference.com/new/990104.html

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3. NET NEWS: Discounting the Internet, Mazda's build-to-order roadster online, Forget fiber optics: wireless is way to go for Internet access, Commentary: Two-Way Internet Desktop Web Sites On The Horizon

>Discounting the Internet

Learn how the Internet is changing business for retailers and how they are getting laws made to prevent producers from dealing with customers directly. http://boston.com/dailyglobe2/032/business/Discounting_the_Internet+.shtml Boston.com, 010201

>Mazda's build-to-order roadster online

Mazda is the first carmaker with the sense to sell built-to- order cars on the Internet. http://www.mazda.com Mazda.com, 010201

>Forget fiber optics - wireless is way to go for Internet access

Wireless way to go? Find out why it could be the cheaper and better way to get faster Internet access. http://www.csmonitor.com/durable/2001/02/01/fp12s1-csm.shtml Christian Science Monitor, 010201

>Commentary: Two-Way Internet Desktop Web Sites On The Horizon

It's time for Internet version 2.0. How is it different from the one we're on now? Read and see. http://www.gomez.com/features/article.asp?topcat_id=0&col=82&id=7057 Gomez.com, 010130


Sharing information with everyone in your company is easy with the Web these days, right? Not always. As the company webmaster, do you have time to convert everything to HTML, format it, and update it the instant changes are made? If not, check out Net-It™ Central from Informative Graphics® to Web publish all the documents your people need easily, quickly and reliably and make them available instantly. Click here to find out more and try a free download. http://www.infograph.com/wr.htm



-- AllNet Training -- AllNet Training provides Courses and Events focused on the needs of IT, Communications, Internet, and Development Professionals. Search the AllNet Training Locator for over 5000 instructor led and technology based training courses. http://allnettraining.com


That's it for this week, see you next time.

Andrew King Managing Editor, WebReference.com update@webreference.com

Alexander Rylance Assistant Editor, WebReference.com arylance@internet.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For information about advertising in this newsletter, contact Frank Fazio, Director of Inside Sales, Jupitermedia Corp. Call (203)662-2997 or write mailto:ffazio@internet.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For details on becoming a internet.com Commerce Partner, contact, David Arganbright, Vice President, Commerce and Licensing, (203)662-2858

mailto:commerce-licensing@internet.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This newsletter is published by Jupitermedia Corp. http://internet.com - The Internet Industry Portal ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To learn about other free newsletters offered by internet.com or to change your subscription - http://e-newsletters.internet.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ internet.com's network of more than 100 Websites are organized into 14 channels... Internet Technology http://internet.com/sections/it.html E-Commerce/Marketing http://internet.com/sections/marketing.html Web Developer http://internet.com/sections/webdev.html Windows Internet Technology http://internet.com/sections/win.html Linux/Open Source http://internet.com/sections/linux.html Internet Resources http://internet.com/sections/resources.html Internet Lists http://internet.com/sections/lists.html ISP Resources http://internet.com/sections/isp.html Downloads http://internet.com/sections/downloads.html International http://internet.com/sections/international.html Internet News http://internet.com/sections/news.html Internet Investing http://www.internet.com/sections/stocks.html ASP Resources http://internet.com/sections/asp.html Wireless Internet http://internet.com/sections/wireless.html ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To find an answer - http://search.internet.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For information on reprinting or linking to internet.com content: http://internet.com/corporate/permissions.html ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Copyright (c) 2001 Jupitermedia Corp. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~