April 3, 2008
I’ve seen a lot of articles discussing the importance of HTML and XHTML semantics. I’ve seen articles describing what it means for a document to be semantic. Most of these articles, however, don’t provide a serious overview of what HTML elements actually may be considered semantic — - and what those semantic elements actually mean.
And, even more particularly, why it matters.
Semantics is an erudite area of study. Literally, semantics can be fairly defined as the study of meaning in communication. Communication can readily be extended to cover symbolic notations, representations of language, organization of language, body language and information structures. In developing a web page, we are organizing a means to communicate the content of that page: ideally, we are organizing the page in such a manner that it will be understood regardless of the method by which the page is accessed. It should be equally understandable whether seen, heard, or felt.
The semantics of HTML structure, then, are clearly an important part of web design. Sending mixed signals to the user agent or the user by using a blockquote purely for it’s native indentation is an abuse of semantics: even the visual impact is dependent on the assumption that user agents will consistently render a blockquote in an indented manner.
It’s not precisely an issue that you’ve used a semantic element for presentational means, because, in fact, you’ve done more than that: you’ve presented a block of text which is not quoted material as if it were.
Semantic elements of HTML carry meaning regardless of your knowledge of that meaning. The result is that the misuse of an element creates the potential to mislead or confuse an end-user.
The most obvious examples in common use are those which make use of elements with semantic meaning which also offer a browser-contributed default presentation in order to use that presentational style. The blockquote example above is not uncommon; similarly, the use of empty p elements to create extra white space or heading elements used as a questionable SEO technique in substitution for normal paragraphs.
Other examples which bear mentioning include the use of empty anchor elements to trigger Javascript events — - in this case, it’s partially a limitation of the identity of an anchor element, but an empty anchor element should always be considered an error, as it results in a behavior-less anchor if the Javascript is not available.
Now, you may point to the following paragraph, from the HTML 4.01 specifications, as a response to my opinion:
Authors may also create an A element that specifies no anchors, i.e., that doesn’t specify href, name, or id. Values for these attributes may be set at a later time through scripts.
The fact that it is allowed by the specification does not make it a best practice. With all due respect to the W3C, this should not be permitted. For reference, the HTML 5 specification currently reads:
If the a element has no href attribute, then the element is a placeholder for where a link might otherwise have been placed, if it had been relevant.
In addition, although I won’t quote everything, the specification states that an anchor which does have the href attribute must specify a URI as the value of that attribute. It appears to essentially state that an anchor element should have no semantic meaning if the href attribute is not set and valid. But I could be wrong.
The best means to avoid the misuse of elements is to have a clear understanding of when and why a given element should be used in web development. To hopefully expand on your knowledge in that respect, I’m attempting to provide a semantic guide to HTML elements for your reference and rich disagreement.
Be aware, however, that semantics are largely a matter of opinion. It’s not a question of blindly following the guidelines set by a group; it’s a question of interpreting those guidelines to the best of your ability and belief. This guide reflect how I think HTML elements should be used; and I welcome your opinions.
Other HTML Semantics Articles
February 11, 2008
The Wikipedia article on Standards in software contains a very good definition of standards, particularly as we might need to view them when talking about web standards:
- Standards (software)
- Software standards enable software to interoperate. Many things are (somewhat) arbitrary, so the important thing is that everyone agree on what they are. Software standards is one of the Unsolved problems in software engineering
On the whole, the article at Wikipedia is a good example of what isn’t so great about Wikipedia — - poorly written, incomplete: the article is more a collection of notes in preparation for writing an article than it is a real document. Nonetheless, the above definition contains a gem of perception concerning exactly what it is that standards actually do. Standards enable software to interoperate. Standards increase the ability of various programs to cope with what is fed to them.
And, fundamentally, that’s all they do. Standards, by themselves, are not in any way equivalent to “appropriate” or “good.” Web standards enable one program to understand what has been notated in another program. An HTML document may be an incredibly simple and basically inert program, but it is essentially a software program.
But they don’t actually dictate a lot about how that code is actually used, or what elements go into the program.
Let’s try a comparison to cooking. If you’re cooking, you’ll follow a recipe. The recipe dictates units (cups, teaspoons, liters, etc.). The recipe also dictates ingredients, sometimes with substitutions. However, some aspects of a recipe are actually pretty imprecise — - cooking on “medium heat,” “beat until stiff peaks form” or “use one large egg” are all examples of specific directions which do not necessarily convey the information needed to perfect a recipe.
Furthermore, a recipe does not necessarily include every detail of creating the meal — - it may leave out key pieces of information which it assumes you know: stirring the sauce, checking the internal temperature, or removing the pits. These are necessary aspects of preparing the meal right, but which require external knowledge to comprehend.
If you want to bake a cake, you do need to follow the recipe (mostly.) If you substitute baking powder for flour, you will not end up with a particularly appetizing cake — - but you’ll also have problems if you ONLY follow the recipe, and include the eggs with their shells.
Molly Holzschlag mentioned this idea recently, and I’ve certainly written about it before, but it’s a valuable point and worth reviewing.
Besides, I thought of this analogy to cooking the other day while I was making dinner, and just had to get it out of my system…
January 24, 2008
In October of 2006, I published a brief article about Marcel Salathé’s interesting Java Applet to generate node graphs of web page structure. In that article, I stated:
I’d love to be able to produce graphs where I chose the color coding pattern for particular tags. I could set all non-semantic tags to be bright red, to easily spot the condition of a site in that respect. I could focus my attentions on inline versus block elements, or I could differentiate between different levels of headings.
More recently, I received comments on that post from a visitor who thought my idea to change this was a good one — - so, at long last, I’ve gotten around to doing it.
Semantic HTML Graphs
And here’s an example of output:

The graph pictured here is for Metrolinx, the Greater Toronto Transit Authority — - and Joe Clark’s failed redesign of the year. It makes for a pretty interesting case study. I know the output is small; but bear with me.
In this graph, you can clearly see long strings of orange nodes, which indicate nested table elements. You can also see significant clusters of bright red nodes, indicating deprecated tags. Altogether, the site is a maze of primarily long wavelength colors. In general, in the color scheme I’ve set up, greater densities of long wavelength colors (red, orange, pink) shows a dependence on tables for layout and presentational elements. Short wavelength color (blue, green) indicate more semantically meaningful structures.
I made a number of small changes to the script which I think add value. First, I added the ability to change the root node you’re mapping. I don’t know that this is incredibly valuable, but it does provide an interesting alternate piece of information. The node switching is limited; it will only check the first node specified of that particular content type.
The second change is to provide a variety of color schemes. The default is pretty complicated, although I drew the line well before attempting to provide a subtly different shade for every single element. I hope that the colors provided at least give you an idea of what you’re looking at, however. The alternate color schemes (two, at the moment) are much simpler: one which simply differentiates between allowed and deprecated elements and another which highlights all inline elements (a, dfn, samp, etc.).
Now, I’ve never programmed in Java before, and although the changes I made to Sala’s source code are relatively slight, it’s highly probable that there are bugs; and I’ve certainly not managed to remove any bugs from the original code.
The last thing I need to mention is concerning the accessibility of this applet. It’s just not accessible. In fact, I know little about how to make Java accessible in the first place; but even so, the entire concept of this applet is highly dependent on color. There can be no question that if you are color-blind or otherwise sight impaired this will be a problem. Additionally, there is absolutely no means present for any screen-reader to understand the input. I do hope to change this at a later date, and author a text-based output which will provide a separate, accessible interface with the information, but that just hasn’t happened yet.
Also worth looking at:
- Validation Graphs - a stand-alone Java application also based on the HTML graph script which spiders pages and checks them for validity.
- Web2DNA - same basic idea, different implementation.
December 23, 2007
The justification that a web site is accessible because it “follows standards” contains a serious fallacy. Specifically, the assumption that standards support accessibility.
One root of current standard accessibility practice is conformance to the HTML or XHTML standards set by the World Wide Web Consortium (W3C). This is a fine practice, and certainly should be maintained. Using correct syntax and following a standardized method of communicating information is always a solid best practice. However, this should absolutely not be taken to mean that following these standards is the same as applying the principles of web accessibility.
Web standards only provide accessibility to the degree that they have been designed to do so — - and the guiding principle behind standards development (excluding accessibility-specific standards, of course) has not generally been to support accessibility. Web standards have been designed purely to establish a set, correct method of using the underlying code — - whether presentational (CSS), structural (XHTML) or behavioral (ECMAscript.)
In many (most) cases, web standards do not in any way require best practices — - they merely require conformance. Take HTML, for example. Web standards would permit the usage of table elements for layout, because they do not define semantic usage for the table element. Web standards also permit a variety of presentational elements, such as font, strike, or u. It all depends on what standard you have chosen to follow.
HTML5, most recently, is considering such contrarian steps as removing the requirement that alt attributes be required for images. This ensures the existence of a valid HTML5 web site which can radically fail basic accessibility guidelines. On the other hand, it may reduce the likelihood that some so-called “accessible” web sites will be littered with alt="this is a spacer graphic".
Does this necessarily mean that the standard is wrong or right? No, not as such. Different standards support different needs — - it is important to keep distinct the purpose of the standard. Conforming HTML is just that: Conforming HTML. It means nothing more.
Nonetheless, as an accessibility advocate, I feel that it’s important to support accessibility issues within the development of new standards. Taking the alt attribute issue in HTML5, for example, the lack of any perceived benefit to not requiring the attribute suggests to me that the better path would be to continue to require it. There are numerous examples of important accessibility aspects in HTML5 which are not yet included.
There seems to be a strong element of specious judgement: elements which are not supported by current user-agents are considered not to be needed. This seems a ridiculous expectation: after all, if unsupported elements aren’t needed, than why develop a new specification at all? What we’ve got must work just fine!
Practically speaking, user-agent support and developer use should both be only marginal issues when trying to decide what elements are most needed in a specification. The fact that elements are unused on either end are not a judgement on the value of that element; merely a judgement on the awareness of the element, on the clarity of the existing specification, or on the complexity of the implementation.
Nobody (or almost nobody) uses the q inline element. Does this mean that the element isn’t valuable, and should be discarded? No. It means that Internet Explorer should add appropriate support for it. The same is true for accessibility issues. The standards should support them to their best abilities: if an element or attribute could hypothetically add to the accessibility of a site, then the fact that it is little used or poorly supported should be entirely irrelevant. Support should follow the standards; not the other way around.
At the root of things, my stance is that I am unwilling to support a standard which specifically excludes features which are needed in order appropriately provide best-practice accessibility. HTML5 is still a long way from being done; and even further from being implemented (if it ever is,) but the removal of such attributes as the header from table markup, the inclusion of defined non-semantic elements such as b, and the “WYSIWYG exemption” on the font element strike me as decisions badly in need of reconsideration.
December 5, 2007
An interesting thought in indexing and handling page structure is the concept that different areas of a single page can be identified and considered independently from surrounding bodies of content. This particularly applies to specific and readily identifiable data-types, such as phone numbers, postal codes, or abbreviations; but can also be extended to include broader content labeling.
A well-structured XML document has an absolutely clear labeling system for data built into the structure. If you take any RSS feed, for example, the elements which identify <title>, <link> or <managingEditor> can’t readily be mistaken.
A well-structured, semantically sensible XHTML or HTML document doesn’t offer nearly the same degree of data particulation — - the higher level data elements can sometimes be fairly clear, as is the case with <address> or <cite> elements, but other potentially valuable elements end up providing relatively neutral value: <h2> or <div>.
Read more: Thoughts about Content Labeling and Data
September 20, 2007
Following up on tables and CSS, the grid model of layout execution is part of the CSS level 3 working draft. The specifications for the grid layout module being discussed were released on September 5, 2007.
This module describes integration of grid-based layout (similar to the grids traditionally used in books and newspapers) with CSS sizing and positioning. Document Abstract
Semantically, the grid layout system is a nice development — - it is a system explicitly and exclusively designed for layout, which has no required HTML component. An excellent companion to the div element.
Read more: CSS3: On Grid Positioning and Layout
August 23, 2007
At Cre8asite Forums this week, a lengthy discussion on the ultimate value of pure CSS (Cascading Style Sheets) based layout over the use of tables has been taking place. Sometimes, living in the sheltered world of accessible and standards-based design, I can lose touch with the fact that many people out there simply don’t accept some of the same guidelines I work with every day — - and that this does not, in any way, mean that they haven’t given the subject a fair shot. Very good arguments have been made to defend each side.
On the whole, I think this discussion is an old, worn-out subject: those who won’t use tables generally don’t use them out of principle, and those who do use them out of pragmatism and a justified awareness that principles don’t build websites. I want to review the question once more, however, ignoring the entire question of principle.
Read more: Why not tables? Is CSS really better?
Return to Top
Filed under Semantics, Web standards by Joe Dolson