Knowledge Base

A depository of useful information I have learned

This is the section in which I attempt to contribute something more useful than long-winded sermons about unimportant matters like the meaning of life. While the cupboard may be initially somewhat bare, it is like all things in this life a work in progress. It shall one day go on to cover a wide range of computing topics, but for now it covers website development.

Building nice websites

This document will guide the unsuspecting webmaster (or webmistress) through standards compliance, accessibility, (X)HTML, Document Type Declarations, Cascading Style Sheets, and a little dash of DOM (via JavaScript) for good measure. There will be absolutely no Flash or DHTML, or any other types of showing off. Just basic, clean mark‑up to form a solid foundation from which the intrepid may base their ambitious future projects. Long is the way, and hard, that out of Frontpage leads up to the light of HTML mastery. My aim here is to make that crooked path straighter.

This guide is strictly client-side only. More hefty topics such as server-side scripting, CGI processing, security, database management systems, security, server configuration, hosting and domain registration and security will be discussed later. When I’m better at it.

There will be a number of digressions throughout this document. They are, as you would expect, optional reading for the curious.

How to read this tutorial

This was never intended to be an exhaustive guide; such a task would take years. It would also be gratuitous because the web teems with such literature already. This tutorial deals as much with sundry issues as those strictly relating to the coding itself. Many topics are glossed over almost entirely if a link to another site or a Wikipedia entry will suffice. If you want to learn more, then click by all means, otherwise read on. As such, this tutorial is sympathetic to both the methodical accumulator of knowledge as well as the impatient.

If you are in a hurry, skip the digressions and follow the links selectively: I will try to point out the links that are more necessary than others. Alternatively, if it’s more about the journey than the destination, allow yourself to take the scenic route and cast your net wide over the gateways to all world knowledge littered throughout this page.

And if anything is unclear, misleading, or completely wrong, do not hesitate to email and bring it to my attention. But for now, may your grokking be plentiful and enjoyable.

Table of Contents

  1. Tools of the Trade
  2. DTDs and the W3C
  3. First encounters with HTML
  4. The Duality of HTML/CSS

1. Tools of the Trade

If you are reading this, then you already have one obviously essential piece of equipment: a web browser. Ideally, you will need more than one. The internet is not Internet Explorer. There are in fact hundreds of browsers out there, and no two will render a website or interpret JavaScript in exactly the same way. Usually the differences are minor and negligible; sometimes they are major and catastrophic. This is the first (of many) piece of bad news.

The second is that every graphical browser is customisable in many ways. People view your site on different operating systems in different resolutions. Some will disable JavaScript, some will disable images. Some may have vision impairments and use huge font sizes or assistive technologies such as screen or braille readers. You can’t make any assumptions or take anything for granted, and you certainly have no control over how somebody else views your website.

For now this is not important: neither Rome nor Disgraceland was built in a day. All that matters is that you realise the need to develop your websites with a little bit of latitude and for more than just Internet Explorer, and to do this you need to download and install additional browsers as per your operating system, or at least be aware of their existence. I recommend as many of the following as you deem necessary:

Wikipedia’s List of web browsers provides a good list and also a useful breakdown of browsers into their respective rendering engines. Instead of worrying about every browser in existence, the most common from each category will suffice under most circumstances.

Digression on HTML: Markup languages such as HTML are the reason web developers suffer so many headaches. Malformed HTML is not summarily rejected by web browsers in the same way as traditional compiled languages are. Instead, the browser ambitiously tries to read through any errors and present to the world what it thinks the author intended, and every browser has its own (usually exclusive) ideas on how best to do this. This curious behaviour would be no big deal if it didn’t extend to valid HTML as well.

The other tool you will need is a text editor. All operating systems have their own, though some are better than others. Most modern Linux distributions ship with desktops like GNOME or KDE, both of which have perfectly good editors: gedit and Kate respectively. Windows, on the other hand, ships with Notepad. It’s physically possible to write websites with it, but that doesn’t mean you should. Disgraceland was built entirely in Notepad++, an excellent open-source editor for Windows. If you are a Mac user, I can’t help you. The one time I used a Mac to debug some JavaScript, I used vi. See Wikipedia’s Comparison of text editors for more information regarding feature support and OS compatibility. It is noteworthy to mention that if you use Microsoft Word to write HTML, I will cut you.

To complete the triad, you will also need an image manipulation program. Creating graphical artwork for your site is outside the scope of this tutorial, so this topic won’t receive any further mention here. Adobe Photoshop is the premiere tool for such things; as far as free open-source alternatives go, The GIMP runs first in a race of one. With that, so ends Chapter 1.

2. DTDs and the W3C

The W3C develop, among many other things, standards in web development in an attempt to avoid (or at least minimise) the nightmare of incompatibility and wildly varying interpretations of HTML.

One way in which they do this is via a Document Type Definition (DTD), which specifies and defines the valid HTML elements and their associated behaviours. If you are wondering what one looks like, here is the HTML4.01 Strict DTD. Don’t panic if it doesn’t make any sense. All you need to know at this point is that you need to declare one at the beginning of every HTML page you make, which leads to the beginning of a very contentious debate among web developers as to which DTD should be declared.

The debate is split into two camps, HTML and XHTML. Both camps are very adamant in their belief that their way is best, but the XHTML zealots are wrong. Ignore them, for they will lead you astray.

Digression on XHTML: Why then is there so much debate if the answer is so obvious? XHTML is not, as many people seem to believe, newer or better than HTML; it is simply a reformulation of HTML in XML, using essentially the same subset of elements. I won’t bore you with theory and instead jump to the meat and potatoes of the issue. As of version 7, no release of Microsoft Internet Explorer supports XHTML. Since this represents roughly 80–90% of the online community, this is quite significant. The only way around this is for the paged to be served as Content-Type: text/html, as nearly every website in the world is. This means that every argument for XHTML superiority they have ever wielded falls flat instantly, because what is being served up to the world is not XHTML in any way, but instead malformed HTML. It is only due to the generosity of browser rendering engines that XHTML pages display correctly at all.

For the terminally curious, I invite you to read Sending XHTML as text/html Considered Harmful by Ian Hickson.

It is now time to see the what the causer of all this fuss actually looks like:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">

This should be at the very top of every webpage you create. If this chapter has taught you nothing else, then that is enough. If you look closely enough, you will notice that it says strict. There are two other DTDs for HTML4.01, transitional and frameset. We won’t be paying much attention to either of them beyond this passing mention. The transitional definition is only to bridge the gap between the Dark Ages of HTML3.2 and should under no circumstances be used to construct new pages, while the frameset definition should never be used because neither should frames. Frames are the worst.

Digression on DTDs: Contrary to widely held belief, the document type declaration itself does not reference the DTD in the URL in any way and does nothing inherently to define or enforce any compliance to standards, nor was it ever intended to. Their presence in a HTML document merely tells some browsers not to interpret the page in “quirks mode”: a mode intended for backwards compatibility with websites written for (very) old browsers.

On that note, onto Chapter 3.

3. First encounters with HTML

HTML documents are written in plain ASCII text and use a heirarchical system of nested elements. What does all that mean? ASCII is American, and therefore an acronym. It defines 95 printable characters: the numbers 0–9, uppercase and lowercase A–Z, and a handful of punctuation characters. In other words, every printable key on a standard QWERTY keyboard. The heirarchy of nested elements is described by the following diagram:

HTML heirarchy
Figure 1: Grey represents “mandatory” top level elements; blue represents block-level elements (under the <body> element); green represents inline elements.

At this stage it does’t matter if you don’t know what p, meta or div are; all that will be revealed in time. This concept of heirarchy will come back to haunt you when you learn about the Document Object Model when covering client-side scripting. For now, all you need to know is that HTML documents follow a tree structure beginning with the root <html> element.

Elements are represented by a pair of opening and closing tags and give special meaning to the enclosed text. Here is an example:

<p attribute="value">text, text, and <b>bold text</b>.</p>

You will notice that <b> is nested within <p>, and that <p> contains an attribute with a specified value within quotation marks. Beyond the mere memorisation of roughly 100 elements and all their associated attributes and values, this single example encapsulates a lot of what you need to know about HTML. Of particular importance is this checklist:

One more significance of the above example is the difference between block and inline elements. While easy to comprehend, these differences are nonetheless difficult to explain succinctly. The clearest explanation I have come across is Tommy Olsen’s:

“All browsers, even those that lack support for CSS, have a default presentation for each element type. Block-level elements are rendered with an implicit line break before and after, while inline-level elements are rendered where they occur in the text flow. Thus it is not possible to put two block-level elements side by side without using CSS.” (The Autistic Cuckoo, 2005)

Block-level elements may contain inline elements, and sometimes other block elements. By contrast, inline elements may only ever contain other inline elements. The example at the beginning of this chapter shows a block-level paragraph element containing an inline bold element.

It is now time to see a real live HTML page from the inside:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
   <head>
      <title>This is my first HTML page</title>
      <meta http-equiv="Content-type" content="text/html;">
   </head>
   <body>
      <h1>Hello World!</h1>
      <p>This is a paragraph with some <b>bold</b> and <i>italic</i> text.</p>
      <p>Here is a second paragraph.</p>
      <h2>A level 2 heading</h2>
      <blockquote>
         <p>&quot;No matter where you go, there you are.&quot;<br>
         <cite>--Unknown</cite></p>
      </blockquote>
      <p>EOF</p>
   </body>
</html>

See the output of the above code and compare the two to understand what is happening. The left-indentation indicates the level of nesting, and is a common coding convention (i.e., it isn’t strictly necessary, but it makes the code a lot easier to read and follow). There are some unfamiliar elements and attributes, but they aren’t important right now. What matters is the overall structure:

  1. DTD: Begins at the start of line 1.
  2. Header: The <head> element and its nested elements. This defines information about the HTML page that is not rendered by the browser, such as links to external JavaScript or CSS files.
  3. Body: The <body> element and its nested elements. Its contents are directly rendered and displayed within the browser window.

One more noteworthy item to touch on is the &quot; notation which, if you looked at the output of the source code, renders as normal double-quotes. This is known as character encoding. As in other languages, certain characters in HTML are given special significance, such as the < and > characters which designate the start and end of a tag. Whenever the parser encounters a <, it assumes that it is the start of a tag. What if we want to include a < as part of the document text? Character encoding solves this problem, by substituting (or escaping) the literal character for its ASCII value or, in this case, an easy‑to‑remember entity reference. There is more to this topic that will be explained when we encounter URL encoding, but for now you only need to know that when you see some letters or numbers within an ampersand and semicolon, it is referring to a character.

Digression on semantics: What is the difference between a tag and an element? These terms are often used interchangeably, which is highly annoying to pedants. Strictly speaking, the element is the abstraction and the opening and closing tags are HTML’s way of representing the element with a surrounding pair of identifiers inside angle-brackets. The difference will become more apparent when we tackle the DOM.

In case you are wondering when all the HTML elements will be explained in full, they won’t. Some key elements will be covered in the next chapter, but a comprehensive reference falls well outside of the scope of this guide. The reasons for this are clarity and redundancy: the main aim of this guide is to explain background issues, best practices and ideological debates; and perfectly good reference sites already exist. Here are two of them:

These are links that you will want to bookmark because you will be visiting them again. And again. And again. Especially W3Schools.

4. The Duality of HTML/CSS

This chapter is almost a historical digression, but it’s important. HTML4 introduced some sweeping changes to the HTML element subset: new elements were added, many existing elements were deprecated.