Style guide for hypertexts on dpk.io

This is a small style guide for the markup of hypertext documents on dpk.io. The goal of these guidelines to make sure there is helpful metadata in each page for later reference, to maintain the best possible typographical and linguistic standards across pages, and to make sure that stylesheets are interchangeable between documents.

Matters of linguistic style

H. W. Fowler’s Dictionary of Modern English Usage, First Edition is the first port of call for all stylistic issues. (A modern reprint is available.) (Todo: I would like to start a project to digitize the first edition, since it’s now in the public domain; and, when that’s done, to add annotations to the entries to show which ones are outdated compared to current usage. This is a large project, though, and would probably take at least a month of full-time work.)

I increasingly prefer Oxford English Dictionary spelling these days. (Standard British spelling, but with -ize instead of -ise in words like “anthologize”, “globalize”, and their derivatives like “organization”. This may seem weird, or just pandering to American usage, but etymologically it makes sense: the -ize suffix comes from the zeta (ζ) in the Greek -ιζειν; were the suffix -ise, it should be a sigma (σ).)

Punctuation

Use proper quote-marks, never straight ones.

Use proper feet-and-inches and minutes-and-seconds marks, not quote marks. And don’t use primes for closing quote marks when the quote ends with a figure, as a certain popular web-publishing tool does …

Em dashes are surrounded by one space on each side. Ideally these would be thin spaces or hair spaces, but these are rather tricky to add in HTML.

En dashes are used for ranges (1846–2009, the Manchester–London line), joining names and similar in technical context (Bose–Einstein condensate, Rivest–Shamir–Adleman), and as a ‘super hyphen’ in cases where two hyphenated forms are used together, to group the phrases. (Solid-matter–based life-forms.)

A hyphen is a U+002D Hyphen-Minus; in theory it ought to be U+2010 Hyphen, but the two are indistinguishable in most typefaces, and the latter is tricky to type.

A minus sign is a U+2212 Minus Sign. It is tricky to type, but not used often enough to be bothersome to find it in a character palette.

An ellipsis is a U+2026 Horizontal Ellipsis, not three periods.

Markup

In general, use markup as sparsely and as simply as possible. Don’t bother closing tags that don’t need closing; don’t include tags that are always implicitly added; quoting attributes is often a waste of time; etc.

Except in exceptional cases, save documents in UTF-8.

Header information

Place information in the header in decreasing order of generality; as one reads further down the header markup, the tags should become more specific to the particular document, for example, the markup at the top of this document reads:

<!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<link rel=stylesheet href="/simple.css">
<meta name=created content=2013-03-06>
<title>Style guide for hypertexts on dpk.io</title>

DOCTYPE should be in all documents; most, but not all, documents, will be in English; most of those will be in UTF-8; some of them will use the simple.css stylesheet; this is the only document I have created today, but there could be more; and the title is obviously as specific to the document as the rest of the content.

There should always be a <html> tag with an appropriate lang attribute. Some stylesheets use CSS hyphenation, and some browsers don’t apply hyphenation properly without a properly–marked-up document language set.

There should always be a <meta name=created> tag; this semi-formal standard (used by the BBC, among others) helps to keep track of the approximate age of documents.

Document structure

The title of the document should be marked up in a <h1> tag in a <header> immediately before the start of the document body. A subtitle may follow, in an <h2> tag; if you are not the author of a document, note that in an <h3>.

Start heading levels for headings inside the body of the document at <h1>. In general, heading levels lower than <h2> should be avoided; Feynman only needed two levels of hierarchy, after all.

At the end of the document, sign the author’s name and the date of authorship in a <p> in a <footer>; also note any other contributors to the article, and the publisher if not the same as the author. Include links to the authors’ personal web-pages, if available. Typically the signature should begin with an em dash. (Fixme: Perhaps instead add the em dash with a CSS :before rule?)

Using appropriate tags

In the wake of the XHTML/CSS revolution, many authors switched from using the <i> tag for italics to the <em> tag for emphasis. While the effort to ‘semantically’ mark up their documents was admirable, other tags exist for many of the purposes that <em> has been since abused for.

Briefly: when italicizing the title of a book, film, album, etc., use <cite> (a tag which has been misunderstood as the result of an error in the HTML 4.0 specification as meaning the name of an author, instead of the name of a work); when italicizing the first occurrence of a term which is to be defined, use <dfn>; when italicizing a variable in a mathematical expression, use <var>; and for all other cases of marking up text which is normally italicized, use <i>. These tags have all been in HTML since before 1.0, and are supported by just about every browser ever made, save for the really early ones.

For instance, the consider this sentence: The film 2001: A Space Odyssey has a je ne sais quoi; the director, the person who is overall responsible for the film, is a master, clearly appreciative of the meaning of E = mc2 to our modern hyper-cultural society.

This is correctly marked up as follows:

The film <cite>2001: A Space Odyssey</cite> has a <i lang=fr>je ne sais quoi</i>; the <dfn>director,</dfn> the person who is overall responsible for the film, is a master, clearly appreciative of the meaning of <var>E</var> = <var>m</var><var>c</var><sup>2</sup> to our modern hyper-cultural society.

Markup like this gives more flexibility when styling pages to choose to typeset certain kinds of italicization differently to others. (eg. using AMS Euler for variables; another typeface, or bold-face, for definitions; etc.)

Standard class names

sc
Small caps. This is a shortcut for having an attribute style="font-variant: small-caps;"; but for stylesheets whose fonts lack a small caps variant, text-transform: uppercase; or something else appropriate can be used.
pq
Pull quote. Usually for aside elements.
lower
Transform to lower-case.
upper
Transform to upper-case.
footnotes
A section, probably a div, set aside for footnotes. It will usually begin with a <hr> tag.