HTML Unleashed. Internationalizing HTML: Introduction | WebReference

HTML Unleashed. Internationalizing HTML: Introduction


HTML Unleashed: Internationalizing HTML



No book on HTML is complete without a section on the ways to overcome the pronounced Western bias in the language and to provide for its fruitful application in the worldwide multilingual environment.  This chapter covers the main approaches to this problem, both those used by practicing webmasters all around the world and those devised by standard-setting bodies.

The primary problem related to HTML internationalization (or i18n, as it is often abbreviated: i plus 18 in-between letters plus n) is the correct rendering of characters used by other languages.  This is why I start by examining different standards of character encoding (character sets).  These standards are classified by the length of bit combinations they use, from 7-bit ASCII to Unicode and ISO 10646.

Various HTML internationalization issues were first crystallized in the important document, RFC 2070.  Then, RFC 2070 provisions were incorporated in the DTD for HTML version 4.0.  However, since at the time of this writing there was no HTML 4.0 specification available to accompany its DTD, we will discuss, for the most part, the material of RFC 2070 paying special attention to the cases where it is not identical to the declarations of HTML 4.0 DTD.

In the field of HTML proper, this chapter starts by investigating the new document character set as defined in RFC 2070 and HTML 4.0.  You will be introduced to the important distinction between the document character set and external character encoding.  You'll learn about existing methods of specifying external character encoding, proposed additions to handle multilanguage form input, as well as a number of real-world problems related to HTML character set.

Another big part of the HTML internationalization problem is language markup, that is, specifying the language of a piece of text in order to help user agent software to render it, observing the typography conventions of that language.  Some language-specific aspects of text presentation are also addressed in RFC 2070, which introduces tools to control writing direction, cursive joining, rendering of quotation marks, text alignment, and hyphenation.  As a conclusion, I cover briefly the font issues related to HTML internationalization.


Chapter Table of Contents


Created: Jun. 15, 1997
Revised: Jun. 16, 1997