1. html
  2. /html 101

HTML 101

Our first tutorial aims to teach you plain HTML, from scratch. It would be nice if you forgot everything you knew about HTML and take a fresh look at things for this tutorial.


HTML 101: Back to Basics

I'm going to assume that the reader of this tutorial knows nothing about creating Web pages, but has a basic familiarity with browsing the Web. After all, if you couldn't browse the Web, you probably wouldn't be reading this! If you have some passing familiarity with creating Web pages, you can quickly go through this tutorial and make sure you're familiar with everything here, though as we've mentioned before, even experienced Web developers might be surprised by learning something new here, and we're going to use these concepts a lot in future tutorials.

In particular, we will discuss the following:

  1. What is HTML?
  2. Authoring HTML documents
  3. HTML Elements
  4. The global structure of an HTML document
  5. Paragraph and heading elements

That's not much to start with, but it's enough to make your first complete HTML document. I know you're anxious to get to it, so start off with finding out what HTML is.

So what is this HTML thing?

In order to be able to write good HTML, you must first understand exactly what HTML is. The most obvious place to look for this information is in its name: HTML stands for HyperText Markup Language. If this doesn't make much sense to you, don't worry, because that's what I'm here to explain. Let's take those initials apart one by one, starting from the right:

HTML is a language, but not the kind you tell your kids to watch. It's a computer language, and as such it has some specific rules that must be followed. In other words, it has a defined syntax, a strict way in which it must be written, and, when the time comes, read. But we'll get to that later.

Now HTML is also a markup language. What this means is that it takes a document and marks specific parts of it, giving them special meaning. Why does it do this? Well, let me give you an example. Take the following piece of text:

Acme Computer Corp.

Acme Computer Corporation is a technology-based

company that seeks to offer its customers the

latest in technological innovation. Our products

are created using the latest breakthroughs in

computers and are designed by a team of top-notch

experts.

We are based in Acmetown, USA, and have offices in

most major cities around the world. Our goal is to

have a global approach to the future of computing.

Have a look at our product catalog for some

examples of our innovative approach.

What you, as a human, see in the above text is a heading ("Acme Computer Corp.") and two paragraphs of text. However, if a computer were to see the above text, all it would see is a bunch of characters, perhaps arranged in a certain way. We have an active interest in letting the computer know that the above is a heading and two paragraphs. I'm sure you can guess some of the reasons. In a larger document, we'd like to be able to have the computer produce an outline of the document containing only the headings; or we might want to display the headings in a different style. However, the computer can't do any of these things unless it can see each part of the document for what it is. Thus, we introduce markup to show the computer what is what.

HTML is also a hypertext markup language. Hypertext is text, in any format, with an added feature: parts of the text is linked to other parts of the text, making it easy to jump from one part of the text to another. For instance, in the Acme example above, the phrase "Have a look at our product catalog" hints that such a catalog exists, but the reader doesn't know where to find it. With hypertext, you can link this phrase with the catalog itself, giving the reader an easy way to get to it if he's interested.

But hypertext links aren't just shortcuts. Just like markup, they mean something. HTML is all about document semantics. A document by itself may be informative, but to be truly useful, you must have a way to get to its meaning. Once you have a way of encoding the document's semantics, you can manipulate it in many interesting ways (don't get any ideas... this isn't that kind of Web site!). By defining the links from a document to a table of contents that lists it, to the next or previous documents if it is part of a series, to a glossary or copyright notice, we give the document itself more meaning, and hence, more value. The primary purpose of any document is to convey information, and by specifying the semantics of a document we supply even more information, which can only be a good thing. That is, if it's done right.

But of course, I can't tell you how to do it right unless I tell you how to do it in the first place, can I? So let's talk briefly about how you can author an HTML document.

Now you too can be an author!

You might have noticed that I've been using the term "HTML document" instead of "HTML file." There's a reason for this; HTML documents aren't necessarily files on a computer. An HTML document is a series of characters that, through its special syntax, defines a document. These characters may be stored in a single disk file, but this is not necessary. They may be created on the fly by a program, or may (as is most often the case) be received over a network.

HTML was designed primarily as a language to be used for creating World Wide Web pages. You're probably learning HTML in order to create a Web page. People have started to use it for other uses, but it might be useful to note (one of my rants, you'll get plenty of these) that it's not a very good language for other uses. There are fine document formats for all kinds of uses, but HTML was created for the Web, and is most suitable for the Web.

So how do you publish a Web page? Well, to do this you need a Web server. A Web server is a program that runs on a computer connected to the Internet, that serves out Web pages. This tutorial will not cover the topic of setting up a Web server or publishing your HTML documents on one. Instead we're going to talk about how to create an HTML document; we'll worry about publishing it later.

Now, how are you going to create your HTML document? The easiest way is to create an HTML file using a text editor. Note that a text editor is not a word processor. A word processor is a program that creates a document ready for printing, and stores it in its own format. Recent times have seen word processors that try to store their documents as HTML, but they usually do a terrible job of doing this. What you need is a program that edits simple text files. An example would be the Windows Notepad, or SimpleText for the Macintosh. It doesn't matter which program you use, as long as it is a simple text editor. In the future, we'll have a look at some of the text editors that you can use, and even some that are specially made for creating HTML documents and can do some of the work for you. Try to avoid these for now - they might confuse you with HTML features we haven't discussed yet. So pick something really, really simple, like the Windows Notepad, SimpleText for the Macintosh, or one of the hundreds of text editors available on Unix systems.

Now create a file, and call it anything you want. On some systems such as Windows, you'll need to give it an extension of .html or .htm to indicate that it's an HTML document. For instance, you might want to call it tutorial.html. As you read through this tutorial, you'll be told to type things into your text file, and by the end of the tutorial you'll have a complete HTML document. In fact, you can stop worrying about typing anything into your file until the end of the tutorial, where we have the complete document listed, so concentrate on reading the tutorial and you can create your HTML file later on if you want to.

In order to view your HTML file, you'll need a program that can do this. The technical term for this is an HTML User Agent. A User Agent is a program that can understand HTML documents and process them is some way. One type of user agent is a Web Browser, or just browser for short. You're probably using one to read this tutorial, so I won't bother with telling you how to get one. After you've created your HTML document, you can open it with your browser and view it. If you can't be bothered to do even that, we've included a link to a copy of the document we're going to create at the end of this tutorial.

Well, here we go. It's time to create our first HTML document. We'll start with the most basic concept: Elements.

Elements: Do That SGML Thing You Do

I mentioned that HTML, as a computer language, has a defined syntax. This syntax was not thought up at random, since the idea of a markup language is hardly new. A language called SGML, the Standard Generalized Markup Language, exists that is used to define markup languages. Make sure you get this right: SGML is a markup language whose sole use is to define other markup languages. And one of those many languages is HTML.

Now wait. Before you click on our sponsor's ad and get out of here, afraid that you'll have to learn another language, let me assure you that no such thing is needed. You do not need to know SGML to learn HTML. It's just useful to know that HTML is an application of SGML, which will explain a few things.

Now let's get down to business. Remember the heading we saw in the last section? If not, here it is again.

Acme Computer Corp.

We know this is a heading, but we need a way of encoding this in the document itself. To do this, we make this heading an element. We do this by writing the following (yes, this is HTML!):

<H1>Acme Computer Corp.</H1>

This element can be split into three parts. The first part, <H1> is called the start-tag. Then comes the element's content, which in this case is the text "Acme Computer Corp.". Finally, </H1> is the end-tag.

This element is an H1 element, which happens to mean that it is a level 1 heading (we'll get to that later). You need tags to indicate where the element starts and where it ends. Tags always start with a less-than symbol (<) and end with a greater-than symbol (>). A start tag has the element's name in between these symbols, (in this case, H1). An end-tag has a slash (/) followed by the element's name. Here are some more examples of elements:

<h2>I'm an element.</h2>
<p>So am I!</p>
<p>Hey, me too! <BbAnd me!</b></p>
<p>There's another element after me.</p>
<hr>

The perceptive amongst you may have noticed a few things in the above elements: first, the B element is inside a P element. This is fine; you can have elements inside other elements, as long as you have proper nesting. This means that if an element starts within another element, it must end within that same element, like this:

<P>This is <B>right</B></P>
<P>This is <B>wrong</P></B>

The second line above is incorrect because the B element starts inside the P element, but ends outside the P element.

The second curious thing you should have noticed is that the HR element has no content and no end-tag. This is also allowed, for some element types. The HR element in this case is called an empty element. Using only a start-tag is permitted in this case, but only for elements that are empty.

Now that you know what an element is and how to specify it in your document, it's time to learn about some of the elements in HTML and use them to make your first HTML document.

Yes, yes, but what is HTML made of?

I mentioned that HTML is an application of SGML. This means that every HTML document is also an SGML document. The first thing an SGML document must have is a Document Type Declaration. This means exactly what it sounds like: a Document Type Declaration declares the document to be of a specific type. In our case this type is HTML. I won't go into much depth on Document Type Declarations right now. For the moment, you should use the following declaration:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">

Do not let the angle brackets confuse you. The above is not an element. If you look carefully, you'll notice that the content of the above construct starts with an exclamation mark; this indicates that this is SGML code. And after looking at it a bit you might be glad you don't have to learn SGML. So just take me on my word for this once, and put this at the top of your document. In a future tutorial, we'll explain what this Document Type Declaration means and show that it's really quite simple.

Now that we have specified that this is an HTML document, we can start adding elements. The first element will always be the HTML element. All HTML documents have an HTML element, which contains all the other elements. Let's put in the start-tag and end-tag for this element and we'll worry about its contents later. Here's what we've got so far:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
<HTML>
</HTML>

Every HTML document is split into a head and a body, which are marked by similarily named elements, HEAD and BODY. Every HTML document must have one of each, inside the HTML element. In fact, these two are the only things you can have inside the HTML element. So let's put these in as well and see where we are:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
<HTML>
 <HEAD>
 </HEAD>
 <BODY>
 </BODY>
</HTML>

Notice that I've used a slight indent for the HEAD and BODY element tags. This has no special meaning and is only there to make the HTML more legible. You might have noticed that white-space (that is, spaces, tabs and linefeeds) is collapsed in HTML. This means you can add as much of it as you want to, in order to make your HTML easier to read, without any change in the meaning of the document.

The difference between the head and body of a document is that the head contains mostly information about the document, while the body contains the document itself. Before we go on to the body, we'll deal with the one element every document head must contain: a title.

The title of your document is very important. It distinguishes your document and makes it unique, as well as describing it to the reader. In this case, the title "Acme Computer Corp." is unsuitable, because it doesn't describe our document. A more descriptive title would be "About Acme Computer Corp.", but since this is the world of marketing and we can't afford to be bland, we'll give it a title of "Acme Computer Corp.: Who We Are".

The TITLE element is a very simple element. It cannot contain anything but text and that text is the title of the document. So let's insert our title into our document, which is almost complete:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
<HTML>
 <HEAD>
  <TITLE>Acme Computer Corp.: Who We Are</TITLE>
 </HEAD>
 <BODY>
 </BODY>
</HTML>

We're almost finished! All that remains is to add our document body.

Paragraphs and Headings

Our document, as noted previously, is comprised of a heading and two paragraphs. HTML has elements specifically designed to denote headings and paragraphs. Paragraphs are simply denoted by the P element. Headings are a bit more complicated, as you can have as many as six levels of headings. These are represented by the H1, H2, H3, H4, H5 and H6 elements. The following example shows paragraphs and multiple levels of headings in action:

<H1>Memo: Designing ACC's Website</H1>
<P>Here are some notes on designing AcmeComCorp's Website.</P>
 <H2>Learning good HTML</H2>
<P>The level of HTML knowledge in this company is
terrible. We should all go over to the HTML with Style 
and get up to date with our HTML if we're going 
to design a proper Website.</P>
 <H2>Thinking of something more original</H2>
<P>Can't you people think of something less 
boring? Even our competitors, may their name 
remain unuttered, have more interesting things 
to say about themselves.</P>

Now let's mark up our text with the appropriate tags and insert it into our HTML document:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
<HTML>
 <HEAD>
  <TITLE>Acme Computer Corp.: Who We Are</TITLE>
 </HEAD>
 <BODY>
<H1>Acme Computer Corp.</H1>
<P>Acme Computer Corporation is a technology-based
company that seeks to offer its customers the 
latest in technological innovation. Our products
are created using the latest breakthroughs in
computers and are designed by a team of top-notch
experts.</P>
<P>We are based in Acmetown, USA, and have offices in 
most major cities around the world. Our goal is to
have a global approach to the future of computing.
Have a look at our product catalog for some 
examples of our innovative approach.</P>
 </BODY>
</HTML>

That's it! We're finished! The above is a complete, valid HTML document. You can take a break and have a look at how your browser renders it.

You might have noticed that this page is not the pinnacle of Web design, for various reasons. However, it's a start, and what you've learned today are things you'll need for every single HTML document you ever design. In our next tutorial, in two weeks time, we'll explore the concept of hypertext links and insert some in our document. And since we'll need to link with something, we'll create a product catalog for Acme as well.