April 3, 2008

Guide to Semantic Use of HTML Elements

This is part 2 of 2. Part 1 is Why use Semantic HTML?


This guide only deals with elements which have a specific, human-readable meaning. The semantics of elements such as link, which are not seen in normal browsing, have been left out, as have replacement elements like img or object. In some cases, I’ve also addressed specific attributes which are critical to providing semantic value to an element.

This is not a guide which demonstrates the opinion of the W3C as represented in the HTML, XHTML, or HTML 5 specifications. This is a practical-use guide which indicates my reasoned opinion concerning the best use of each element.

Core Block Elements

div
The div element represents a discrete section of a page which can be meaningfully divided from the content around it. Commonly used to indicate a header region, footer, sidebar, or navigation region; it’s use can extend equally to indicate columns on a page or sections of an article. The element is also commonly used in multiple layers to group lower-level sections together, such as a “content” section which groups a main article, comments on that article, and meta data about the article or author.
h1-h6
The six levels of headings are all used to introduce sections of content (containing p (paragraphs), div (page divisions) or other content) which they describe. They’re perhaps most accurately compared to the structure of an outline: h1 is the top level heading element. The only heading element which can follow an h1 is h2. h2, on the other hand, can be followed by either an additional h2, if the sections are equivalent and both fall under the preceding h1 topic; an h3 if the following section is logically a child of the h2, or another h1 if the following section is a new topic of the same level of specificity as the first heading. A common preference (although certainly not mandatory) is to use only a single first-level heading on any page and to require all subsequent headings to descend from it.
p
The paragraph element is the fundamental building block of prose text. It is also the most appropriate element for marking up a stanza of poetry or other similar discrete block of text. Different from a div principally in that it is specifically intended to indicate text regions, whereas the div element is more broadly specified.
blockquote
This is a very specific use element which should be used to indicate a significant block of text which is being quoted from outside the current source. It should always be paired with a cite element to indicate the quoted source. It may also, optionally, use the cite attribute to contain a URI for the quoted text.

Supporting Inline Semantic Elements

a
When accompanied by an href attribute, the anchor element indicates either an external resource (a resource other than the current document) accessible via hyperlink or an anchored location within the same document. Using scripting, it can be used to perform more complex functions within the current page, but should always maintain a fall-back functionality to retain it’s semantic value.
abbr
The abbreviation element generically indicates a shortened form of a more extensive term or phrase. It is inclusive of an acronym, although the lack of support for abbr in Internet Explorer frequently forces developers to ignore that relationship.
acronym
“Acronym” refers to a subset of abbreviations characterized by their formation from parts (letters or syllables) of the words they are used to abbreviate. The definition isn’t strictly agreed on, but it’s generally agreed that abbreviations formed by the removal of letters from a word are not acronyms.
em
Indicates emphasis. “Emphasis” is a general indication that the emphasized text is in some way more significant than the text surrounding it. Whether a piece of text should be emphasized or not is usually dictated by authorial preference.
strong
“Strong” is described officially as “Stronger Emphasis.” So, practically speaking, it’s an element you use in much the same scenario as you would use em: an authorially determined preference for emphasis.
address
According to the W3C, address indicates contact information relevant to a specific document or part of a document. In practical usage, it’s more commonly used to indicate any block of contact information. As a block-level element, it’s generally reserved for significant blocks of information, rather than being used to mark-up a single e-mail address or telephone number.
cite
A citation is fairly broad, and does not necessarily have to be associated with specific quoted information (although the reverse is not equally true.) cite is associated with bibliographical information, personal quotations, or references to an external resource used in the research towards preparing a document.
code
Indicates a sample of programming code as a general rule. The W3C specifications are clear that this is intended to refer to computer code; and I haven’t yet come across a situation where I needed to post encryption information which was not computer code. ;)
dfn
This is one of the more difficult to define elements — - which is ironic, given that it’s intended to represent the “defining instance” of a term. It is not intended to contain a definition, it is merely intended to enclose a term at the point in a document where it is used in a definitive state. Sounds very legalese, to me.
del
Represents information which has been deleted from a document. This should generally be used with date and time information indicating when the change was made, which can be included in the datetime attribute in the following format: datetime="YYYY-MM-DDTHH:MM:SS". See also ins
samp
Sample output from programs or scripts. Differentiated from code in that the output of a program may not itself be code, but should still be indicated as an example of output.
span
A generic inline-level HTML element. It should not be concluded that span does not contain any semantic value, rather, that it is available to be used when no other element provides suitable meaning. It is preferable to use a generic element and define a meaning for it rather than use an element which has a pre-defined and inappropriate semantic meaning.
ins
The opposite of del, above. Represents inserted text following revisions.
q
Indicates a shorter, inline quotation. Unfortunately, support for the q element is minimal, and it cannot be readily recommended for any use.
kbd
Indicates text to be entered by the user. Rarely used, but useful in circumstances where you are demonstrating the use of a program, along with code and samp.
sub/sup
Superscripting and subscripting of text can be used to indicate footnote references, valence numbers in chemical formulas (such as Fe+3), etc.
var
Along with code, samp, kbd, the “variable” element indicates a variable (or program argument.) It should be reasonably obvious at this point that this language was designed by programmers and not by librarians.

List Elements

ul, ol, li
This is pretty straightforward: lists are used to represent grouped information best represented as a list. ul is unordered, and is generally visually represented as a bulleted list. ol is ordered, and is generally visually represented as a numbered list. It’s common to attempt to apply lists at a significant macro level in organizing the elements in a form or, occasionally, within an entire page, but it’s my opinion that this kind of usage is taking the semantic construct a bit too far.
dl, dd, dt
A definition list literally indicates a list of terms (dt) with their accompanying definitions (dd). Practically speaking, it’s reasonable to use the definition list format for any collection of data characterized by paired relationships with one signifying and at least one descriptive. It’s perfectly reasonable to provide multiple definitions to a single term. Frequently asked questions pages are commonly assembled this way.

Table Elements

table
Oft abused, the table is the best way of organizing and displaying a data matrix. Any kind of two-dimensionally represented data should be organized within a table.
thead
Defines a header region for a data table, which would normally contain the headers (th) for each column.
tfoot
Defines a footer region for a data table, which should include information referential to the columns of data.
tbody
The content bearing region of a table, but also includes row headers.
caption
Briefly describes the table. This is essentially a heading for the table.
th
A heading for either a row or a column, to indicate the type of information within that row or column.
td
A data cell, in which content is placed which corresponds to both the headers for the row and column.
Attribute: scope
Scope: applied to th, it indicates whether the heading information applies to a row or a column. It can also be applied to a row group, for tables which have been divided into multiple sections.
Attribute: headers
A much, much, more complicated way of indicating relationships between data cells and their respective headers. Necessary in complex tables where a given data cell may apply to multiple row or column headers. If possible, just avoid creating tables which are that complex…they’re a headache.
Attribute: summary
Applied to the table element, the summary is a more extensive description of the table, intended to provide non-visual users with the equivalent of a “quick scan” of the table to best understand the purpose it serves.

Separator and “Other” Elements

br
Generates a line break. The semantics of a line break are a commonly debated point - you can read my views in my article “Is a br tag semantic?
hr
Separates two sections with a visible horizontal line. Although this element conveys no specific semantic meaning which is not conveyed by other elements, it provides the advantage of a visual separator between sections when styles are disabled which is otherwise unavailable. I’m not aware of any advantages for other scenarios.

Discouraged (Presentational) Elements

These elements have not been deprecated; but should generally only be used after careful consideration.

  • big
  • small
  • b
  • i
  • tt
  • pre

Is it semantic, or is it presentational? This can be a more difficult question than it initially appears. Take b. Presentationally, it renders text as bold. Semantically, it provides no specific emphasis or other specific meaning. Does this mean that it should never be used? Not clearly. Although it’s difficult to describe scenarios in which these elements are useful, if you assume a scenario in which you want bold text but do not want that text to receive additional emphasis, it makes more sense to use b than it does to use span and style it to be bold.

Regardless, these are not elements that should generally be used without careful consideration that they are, in fact, the best choice for the job. But it’s your call.

Deprecated Elements

  • applet
  • center
  • font
  • dir
  • isindex
  • menu
  • s
  • strike
  • u

Not all deprecated elements are created equal. I find it ironic that strike and u are set right alongside font and isindex. Thinking logically, strike and u are very much in the same vein as b and i. Presentational, but perhaps necessary in some contexts.

Nonetheless, there’s no way I’m going to recommend the use of deprecated elements. Find another way!

If you want to see these elements in action, you may find my semantic HTML graphing tool interesting.

Comments (19)

Filed under Accessibility by Joe Dolson

Why use semantic HTML?

This is part 1 of 2. Part 2 is my Guide to the use of Semantic HTML Elements

I’ve seen a lot of articles discussing the importance of HTML and XHTML semantics. I’ve seen articles describing what it means for a document to be semantic. Most of these articles, however, don’t provide a serious overview of what HTML elements actually may be considered semantic — - and what those semantic elements actually mean.

And, even more particularly, why it matters.

Semantics is an erudite area of study. Literally, semantics can be fairly defined as the study of meaning in communication. Communication can readily be extended to cover symbolic notations, representations of language, organization of language, body language and information structures. In developing a web page, we are organizing a means to communicate the content of that page: ideally, we are organizing the page in such a manner that it will be understood regardless of the method by which the page is accessed. It should be equally understandable whether seen, heard, or felt.

The semantics of HTML structure, then, are clearly an important part of web design. Sending mixed signals to the user agent or the user by using a blockquote purely for it’s native indentation is an abuse of semantics: even the visual impact is dependent on the assumption that user agents will consistently render a blockquote in an indented manner.

It’s not precisely an issue that you’ve used a semantic element for presentational means, because, in fact, you’ve done more than that: you’ve presented a block of text which is not quoted material as if it were.

Semantic elements of HTML carry meaning regardless of your knowledge of that meaning. The result is that the misuse of an element creates the potential to mislead or confuse an end-user.

The most obvious examples in common use are those which make use of elements with semantic meaning which also offer a browser-contributed default presentation in order to use that presentational style. The blockquote example above is not uncommon; similarly, the use of empty p elements to create extra white space or heading elements used as a questionable SEO technique in substitution for normal paragraphs.

Other examples which bear mentioning include the use of empty anchor elements to trigger Javascript events — - in this case, it’s partially a limitation of the identity of an anchor element, but an empty anchor element should always be considered an error, as it results in a behavior-less anchor if the Javascript is not available.

Now, you may point to the following paragraph, from the HTML 4.01 specifications, as a response to my opinion:

Authors may also create an A element that specifies no anchors, i.e., that doesn’t specify href, name, or id. Values for these attributes may be set at a later time through scripts.

The fact that it is allowed by the specification does not make it a best practice. With all due respect to the W3C, this should not be permitted. For reference, the HTML 5 specification currently reads:

If the a element has no href attribute, then the element is a placeholder for where a link might otherwise have been placed, if it had been relevant.

In addition, although I won’t quote everything, the specification states that an anchor which does have the href attribute must specify a URI as the value of that attribute. It appears to essentially state that an anchor element should have no semantic meaning if the href attribute is not set and valid. But I could be wrong.

The best means to avoid the misuse of elements is to have a clear understanding of when and why a given element should be used in web development. To hopefully expand on your knowledge in that respect, I’m attempting to provide a semantic guide to HTML elements for your reference and rich disagreement.

Be aware, however, that semantics are largely a matter of opinion. It’s not a question of blindly following the guidelines set by a group; it’s a question of interpreting those guidelines to the best of your ability and belief. This guide reflect how I think HTML elements should be used; and I welcome your opinions.

Other HTML Semantics Articles

Comments (2)

Filed under Semantics, Web standards by Joe Dolson

December 23, 2007

Supporting Standards that Support Accessibility

The justification that a web site is accessible because it “follows standards” contains a serious fallacy. Specifically, the assumption that standards support accessibility.

One root of current standard accessibility practice is conformance to the HTML or XHTML standards set by the World Wide Web Consortium (W3C). This is a fine practice, and certainly should be maintained. Using correct syntax and following a standardized method of communicating information is always a solid best practice. However, this should absolutely not be taken to mean that following these standards is the same as applying the principles of web accessibility.

Web standards only provide accessibility to the degree that they have been designed to do so — - and the guiding principle behind standards development (excluding accessibility-specific standards, of course) has not generally been to support accessibility. Web standards have been designed purely to establish a set, correct method of using the underlying code — - whether presentational (CSS), structural (XHTML) or behavioral (ECMAscript.)

In many (most) cases, web standards do not in any way require best practices — - they merely require conformance. Take HTML, for example. Web standards would permit the usage of table elements for layout, because they do not define semantic usage for the table element. Web standards also permit a variety of presentational elements, such as font, strike, or u. It all depends on what standard you have chosen to follow.

HTML5, most recently, is considering such contrarian steps as removing the requirement that alt attributes be required for images. This ensures the existence of a valid HTML5 web site which can radically fail basic accessibility guidelines. On the other hand, it may reduce the likelihood that some so-called “accessible” web sites will be littered with alt="this is a spacer graphic".

Does this necessarily mean that the standard is wrong or right? No, not as such. Different standards support different needs — - it is important to keep distinct the purpose of the standard. Conforming HTML is just that: Conforming HTML. It means nothing more.

Nonetheless, as an accessibility advocate, I feel that it’s important to support accessibility issues within the development of new standards. Taking the alt attribute issue in HTML5, for example, the lack of any perceived benefit to not requiring the attribute suggests to me that the better path would be to continue to require it. There are numerous examples of important accessibility aspects in HTML5 which are not yet included.

There seems to be a strong element of specious judgement: elements which are not supported by current user-agents are considered not to be needed. This seems a ridiculous expectation: after all, if unsupported elements aren’t needed, than why develop a new specification at all? What we’ve got must work just fine!

Practically speaking, user-agent support and developer use should both be only marginal issues when trying to decide what elements are most needed in a specification. The fact that elements are unused on either end are not a judgement on the value of that element; merely a judgement on the awareness of the element, on the clarity of the existing specification, or on the complexity of the implementation.

Nobody (or almost nobody) uses the q inline element. Does this mean that the element isn’t valuable, and should be discarded? No. It means that Internet Explorer should add appropriate support for it. The same is true for accessibility issues. The standards should support them to their best abilities: if an element or attribute could hypothetically add to the accessibility of a site, then the fact that it is little used or poorly supported should be entirely irrelevant. Support should follow the standards; not the other way around.

At the root of things, my stance is that I am unwilling to support a standard which specifically excludes features which are needed in order appropriately provide best-practice accessibility. HTML5 is still a long way from being done; and even further from being implemented (if it ever is,) but the removal of such attributes as the header from table markup, the inclusion of defined non-semantic elements such as b1, and the “WYSIWYG exemption” on the font element strike me as decisions badly in need of reconsideration.

1. In point of fact, I can accept the inclusion of one inline non-semantic element (span) and one block level non-semantic element (div). I feel there’s ample justification to allow elements which are not specifically defined on the grounds that not all situations can possibly be covered by the specifications of the language. I see no justification, however, for the inclusion of additional explicitly non-semantic elements.

Comments (16)

Filed under Accessibility, Semantics, Web standards by Joe Dolson

Return to Top