Guide to Semantic Use of HTML Elements

April 3, 2008

This is part 2 of 2. Part 1 is Why use Semantic HTML (HyperText Markup Language)?

This guide only deals with HTML4/XHTML elements which have a specific, human-readable meaning. The semantics of elements such as link, which are not seen in normal browsing, have been left out, as have replacement elements like img or object. In some cases, I’ve also addressed specific attributes which are critical to providing semantic value to an element.

This is not a guide which demonstrates the opinion of the W3C (World Wide Web Consortium) as represented in the HTML, XHTML (eXtensible HyperText Markup Language - HTML reformulated as XML (eXtensible Markup Language)), or HTML 5 specifications. This is a practical-use guide which indicates my reasoned opinion concerning the best use of each element.

Core Block Elements

div: The div element represents a discrete section of a page which can be meaningfully divided from the content around it. Commonly used to indicate a header region, footer, sidebar, or navigation region; it’s use can extend equally to indicate columns on a page or sections of an article. The element is also commonly used in multiple layers to group lower-level sections together, such as a “content” section which groups a main article, comments on that article, and meta data about the article or author.
h1-h6: The six levels of headings are all used to introduce sections of content (containing p (paragraphs), div (page divisions) or other content) which they describe. They’re perhaps most accurately compared to the structure of an outline: h1 is the top level heading element. The only heading element which can follow an h1 is h2. h2, on the other hand, can be followed by either an additional h2, if the sections are equivalent and both fall under the preceding h1 topic; an h3 if the following section is logically a child of the h2, or another h1 if the following section is a new topic of the same level of specificity as the first heading. A common preference (although certainly not mandatory) is to use only a single first-level heading on any page and to require all subsequent headings to descend from it.
p: The paragraph element is the fundamental building block of prose text. It is also the most appropriate element for marking up a stanza of poetry or other similar discrete block of text. Different from a div principally in that it is specifically intended to indicate text regions, whereas the div element is more broadly specified.
blockquote: This is a very specific use element which should be used to indicate a significant block of text which is being quoted from outside the current source. It should always be paired with a cite element to indicate the quoted source. It may also, optionally, use the cite attribute to contain a URI (Uniform Resource Identifier) for the quoted text.

Supporting Inline Semantic Elements

a: When accompanied by an href attribute, the anchor element indicates either an external resource (a resource other than the current document) accessible via hyperlink or an anchored location within the same document. Using scripting, it can be used to perform more complex functions within the current page, but should always maintain a fall-back functionality to retain it’s semantic value.
abbr: The abbreviation element generically indicates a shortened form of a more extensive term or phrase. It is inclusive of an acronym, although the lack of support for abbr in Internet Explorer frequently forces developers to ignore that relationship.
acronym: “Acronym” refers to a subset of abbreviations characterized by their formation from parts (letters or syllables) of the words they are used to abbreviate. The definition isn’t strictly agreed on, but it’s generally agreed that abbreviations formed by the removal of letters from a word are not acronyms.
em: Indicates emphasis. “Emphasis” is a general indication that the emphasized text is in some way more significant than the text surrounding it. Whether a piece of text should be emphasized or not is usually dictated by authorial preference.
strong: “Strong” is described officially as “Stronger Emphasis.” So, practically speaking, it’s an element you use in much the same scenario as you would use em: an authorially determined preference for emphasis.
address: According to the W3C, address indicates contact information relevant to a specific document or part of a document. In practical usage, it’s more commonly used to indicate any block of contact information. As a block-level element, it’s generally reserved for significant blocks of information, rather than being used to mark-up a single e-mail address or telephone number.
cite: A citation is fairly broad, and does not necessarily have to be associated with specific quoted information (although the reverse is not equally true.) cite is associated with bibliographical information, personal quotations, or references to an external resource used in the research towards preparing a document.
code: Indicates a sample of programming code as a general rule. The W3C specifications are clear that this is intended to refer to computer code; and I haven’t yet come across a situation where I needed to post encryption information which was not computer code. 😉
dfn: This is one of the more difficult to define elements — which is ironic, given that it’s intended to represent the “defining instance” of a term. It is not intended to contain a definition, it is merely intended to enclose a term at the point in a document where it is used in a definitive state. Sounds very legalese, to me.
del: Represents information which has been deleted from a document. This should generally be used with date and time information indicating when the change was made, which can be included in the datetime attribute in the following format: datetime="YYYY-MM-DDTHH:MM:SS". See also ins
samp: Sample output from programs or scripts. Differentiated from code in that the output of a program may not itself be code, but should still be indicated as an example of output.
span: A generic inline-level HTML element. It should not be concluded that span does not contain any semantic value, rather, that it is available to be used when no other element provides suitable meaning. It is preferable to use a generic element and define a meaning for it rather than use an element which has a pre-defined and inappropriate semantic meaning.
ins: The opposite of del, above. Represents inserted text following revisions.
q: Indicates a shorter, inline quotation. Unfortunately, support for the q element is minimal, and it cannot be readily recommended for any use.
kbd: Indicates text to be entered by the user. Rarely used, but useful in circumstances where you are demonstrating the use of a program, along with code and samp.
sub/sup: Superscripting and subscripting of text can be used to indicate footnote references, valence numbers in chemical formulas (such as Fe⁺³), etc.
var: Along with code, samp, kbd, the “variable” element indicates a variable (or program argument.) It should be reasonably obvious at this point that this language was designed by programmers and not by librarians.

List Elements

ul, ol, li: This is pretty straightforward: lists are used to represent grouped information best represented as a list. ul is unordered, and is generally visually represented as a bulleted list. ol is ordered, and is generally visually represented as a numbered list. It’s common to attempt to apply lists at a significant macro level in organizing the elements in a form or, occasionally, within an entire page, but it’s my opinion that this kind of usage is taking the semantic construct a bit too far.
dl, dd, dt: A definition list literally indicates a list of terms (dt) with their accompanying definitions (dd). Practically speaking, it’s reasonable to use the definition list format for any collection of data characterized by paired relationships with one signifying and at least one descriptive. It’s perfectly reasonable to provide multiple definitions to a single term. Frequently asked questions pages are commonly assembled this way.

Table Elements

table: Oft abused, the table is the best way of organizing and displaying a data matrix. Any kind of two-dimensionally represented data should be organized within a table.
thead: Defines a header region for a data table, which would normally contain the headers (th) for each column.
tfoot: Defines a footer region for a data table, which should include information referential to the columns of data.
tbody: The content bearing region of a table, but also includes row headers.
caption: Briefly describes the table. This is essentially a heading for the table.
th: A heading for either a row or a column, to indicate the type of information within that row or column.
td: A data cell, in which content is placed which corresponds to both the headers for the row and column.
Attribute: scope: Scope: applied to th, it indicates whether the heading information applies to a row or a column. It can also be applied to a row group, for tables which have been divided into multiple sections.
Attribute: headers: A much, much, more complicated way of indicating relationships between data cells and their respective headers. Necessary in complex tables where a given data cell may apply to multiple row or column headers. If possible, just avoid creating tables which are that complex…they’re a headache.
Attribute: summary: Applied to the table element, the summary is a more extensive description of the table, intended to provide non-visual users with the equivalent of a “quick scan” of the table to best understand the purpose it serves.

Separator and “Other” Elements

br: Generates a line break. The semantics of a line break are a commonly debated point – you can read my views in my article “Is a br tag semantic?“
hr: Separates two sections with a visible horizontal line. Although this element conveys no specific semantic meaning which is not conveyed by other elements, it provides the advantage of a visual separator between sections when styles are disabled which is otherwise unavailable. I’m not aware of any advantages for other scenarios.

Discouraged (Presentational) Elements

These elements have not been deprecated; but should generally only be used after careful consideration.

big
small
b
i
tt
pre

Is it semantic, or is it presentational? This can be a more difficult question than it initially appears. Take b. Presentationally, it renders text as bold. Semantically, it provides no specific emphasis or other specific meaning. Does this mean that it should never be used? Not clearly. Although it’s difficult to describe scenarios in which these elements are useful, if you assume a scenario in which you want bold text but do not want that text to receive additional emphasis, it makes more sense to use b than it does to use span and style it to be bold.

Regardless, these are not elements that should generally be used without careful consideration that they are, in fact, the best choice for the job. But it’s your call.

Deprecated Elements

applet
center
font
dir
isindex
menu
s
strike
u

Not all deprecated elements are created equal. I find it ironic that strike and u are set right alongside font and isindex. Thinking logically, strike and u are very much in the same vein as b and i. Presentational, but perhaps appropriate in some contexts.

27 Comments on “Guide to Semantic Use of HTML Elements”

Jorge
thanks for the article. I’m not new at HTML (HyperText Markup Language), but am at manual coding so this is very helpful!
February 13, 2012 at 10:24 am
Rita
Finally, a site that gives an understandable definition of semantic in relation to HTML (HyperText Markup Language). Thank you for clearing up the question in my mind.
July 1, 2010 at 11:42 am
Jason Grant
Oh and I also like the semantic graph tool. Would be nice if it could output in a JPG or something like that.
April 7, 2009 at 9:51 am
Jason Grant
Nice quick overview of the elements. I am aiming at providing a little more detailed insight into best practices and will be covering each tag and what I have seen people do and not do with it.
Semantic User Interfaces – The Best Practices
April 7, 2009 at 9:50 am
dani
That pragmatism, sometimes simple is not easier. Thanks, Joe.
March 26, 2009 at 7:49 pm
Joe Dolson
In normal language, that means that the page contains no content which is a child of only that div. On this page it’s the div with the id “outer.”
This div is there purely for visual purposes, and as such is non-semantic. However, within the limitations of HTML (HyperText Markup Language) and CSS (Cascading Style Sheets), sometimes that’s a valid choice. The challenge is over how many extra nested elements continue to be reasonable and valid. One is really just fine, in my view — a simple page structure is pretty effectively retained using the odd extra element for formatting.
The resolution is simply a matter of accepting a pragmatic perspective on semantic use of elements – sometimes there are design choices which are not available to you without introducing some degree of extra markup. The balance between pragmatism versus semantics is personal choice.
March 26, 2009 at 10:17 am
dani
Joe,
based on W3C (World Wide Web Consortium) semantic data extractor, this page/post has: 1 <div> with no additional content to their unique child
which div is it? do you need to fix it? how to resolve it?
March 26, 2009 at 3:44 am
Computer Guy
Nice guide to Semantic use of HTML (HyperText Markup Language) elements. I am a big fan of clean HTML practices. I like to see websites that are well written using basic HTML and CSS (Cascading Style Sheets). Since the invention of the blog and PHP (Hypertext PreProcessing) things have gotten a little messy on the average site.
December 30, 2008 at 10:29 pm
Joe Dolson
And people dealing with accessibility tend to use an even greater variety. SEO-focused developers tend to use elements which may be treated as having additional value by search engines; accessible web site developers are concentrating on adding anything which might be able to provide greater meaning or added value for the disabled.
Similar, but different.
May 24, 2008 at 8:59 am
Paris
its amazing how many people only use a mere 20% of all these tags. i mean many times its easier to format a div element than using an h1 element. People dealing with seo thought tend to use more and more a variety of tags since se translate them diferently
May 24, 2008 at 3:21 am
Jason Marsh
Great reference, great article, thanks
May 21, 2008 at 10:26 am
Joe Dolson
Boy, I can’t imagine what your favorite deprecated element might have been. Could it be strike?
Thanks!
😉
April 13, 2008 at 2:49 pm
Elizabeth Able
Hi Joe. I ~~remembered~~ thought of ~~my favorite cat lover and accessibility fan~~ this post today when ~~playing with~~ using my favorite depreciated element. I’ll let you guess which one it is…
Thank you for continuing to blog about these terms. Semantics are an interesting subject matter, in that in isolation they can make for Very Dull reading, but they can also fire some Very Fascinating conceptual discussions that lead to a better web. More power to you.
April 13, 2008 at 2:20 pm
Joe Dolson
That’s probably the best, most concise definition of the dfn element that I’ve seen. What it is, really, is a definition of “defining instance” — something which is notably lacking in the specifications.
I can see your point on the headings. It could well be interpreted that way…I’ll have to rephrase that.
Anchors: yep, that’s true as well. I had a reason for phrasing it in the manner, which I’ve forgotten now – but I should, nonetheless, have included some mention of page anchors. More revisions, ho!
April 11, 2008 at 9:37 am
Stevie D
I think you’re assuming too much understanding of “outline”.
h1 is the top level heading element, and can be followed only by h2
[h1] can be followed by [h2], or [p], [div] or various other block-level elements. I know exactly what you mean, but I think the way you’ve written it, some people new to HTML (HyperText Markup Language) might get the wrong idea.
When accompanied by an href attribute, the anchor element indicates an external resource (a resource other than the current document) accessible via hyperlink
Not true – a hyperlink can refer to a point within the current document.
This is one of the more difficult to define elements – which is ironic, given that it’s intended to represent the â€œdefining instanceâ€� of a term.
It’s dead easy really. Like this…
“<p>The <dfn>dfn element </dfn> is used to mark a term that is defined in the following text.”
I do use the dfn element – I have no idea if it is actually picked up by AT or search engines, but I use it anyway!
I agree that strike belongs in the same category as b and i, although it’s less common. I would keep u as very definitely deprecated, because the use of underlining (however it is achieved) should be reserved for hyperlinks only.
April 11, 2008 at 7:08 am
Joe Dolson
I agree — in fact, I’d go even further to say that there’s a fair amount of confusion in having any elements/attributes with the same names – cite, for example. It can make communication that much more difficult.
April 10, 2008 at 3:39 pm
Xslf
Got it, thanks!
It is confusing when there is a deprecated element and a very non-deprecated attribute with the same name…
April 10, 2008 at 3:37 pm
Joe Dolson
It’s the “directory list” element. As defined by the HTML (HyperText Markup Language) 4.01 specification:
The DIR element was designed to be used for creating multicolumn directory lists. The MENU element was designed to be used for single column menu lists. Both elements have the same structure as UL, just different rendering. In practice, a user agent will render a DIR or MENU list exactly as a UL list.
We strongly recommend using UL instead of these elements.
It’s worth noting, of course, that this has never been an element in wide application.
April 10, 2008 at 9:41 am
Xslf
I’m a bit confused about “dir” that you listed above in the deprecated elements.
Which element is it?
I know the “dir” attribute, but that attribute is definitely not deprecated, and is very needed in order to display Right-To-Left text (Like Arabic or Hebrew) properly.
Can you please clarify?
Thanks!
April 10, 2008 at 2:49 am
Joe Dolson
It’s one of the difficulties with discussing semantics — they vary as much by context as they do by content type. Use of elements can be overlapped, of course, in contexts where more than one element applies to a given content object, but when speaking without a specific context as a focus it’s difficult to be absolutely clear about the issues. 😉
April 8, 2008 at 5:53 pm
Mike Cherim
You’re right. I misunderstood. Regarding that definition, that has been updated it seems from the last time I looked (few weeks ago). I like it 🙂
April 8, 2008 at 5:22 pm
Joe Dolson
I think you may be thinking of a different context for a book title than we are — I’m thinking in the context of a bibliographic reference, for which I certainly wouldn’t use h1. If I was using the book title as a heading for a review or summary, I might consider that.
As to the HTML (HyperText Markup Language) 5 definition of i, as I read it it would actually encompass a book title when that title isn’t being used in a citation:
The i element represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose, such as a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, a ship name, or some other prose whose typical typographic presentation is italicized. W3C (World Wide Web Consortium) HTML 5: i
In my mind, the title of a book as a reference in context or a bibliographic entry would be equivalent, roughly, to a ship name — and certainly fits the phrase “some other prose whose typical typographic presentation is italicized.”
April 8, 2008 at 4:20 pm
Mike Cherim
I like HTML (HyperText Markup Language) 5’s definition of the i element (language and thought). That’s how I use it. For a book title, even though the print reference is hard to equate, I’d use a styled heading. An h1 specifically.
April 8, 2008 at 3:51 pm
Joe Dolson
Yeah, I love the idea of the q element, but in practice I never bother using it. Sad, really.
And yes, I’d be inclined to say that a book title is an appropriate usage of i. It’s not truly an emphasized usage, but italics are a normative practice stylistically for a title reference.
April 7, 2008 at 4:11 pm
Dennis at Web Axe
Great article and reference, Joe. Would you say it’s alright to use the i tag for something such as a magazine or book title? I would say yes. Also, I really like the q tag; it’s a shame that IE (Internet Explorer) does not render it correctly (with quotes).
April 7, 2008 at 4:01 pm
Joe Dolson
You know, you’re absolutely right — that’s totally mis-classified, and I can’t honestly say why I was thinking that way. Thanks! I’ll change it…
April 6, 2008 at 8:38 am
Sarah Bourne
I’m surprised that you have put h1-h6 with the inline elements. Not only are they considered block level elements in the HTML (HyperText Markup Language) specs, but they are (IMHO) one of the most important elements semantically: they define the overall structure and organization of the page.
Otherwise … nice work!
April 4, 2008 at 2:41 pm