In October of 2006, I published a brief article about Marcel Salathé’s interesting Java Applet to generate node graphs of web page structure. In that article, I stated:

I’d love to be able to produce graphs where I chose the color coding pattern for particular tags. I could set all non-semantic tags to be bright red, to easily spot the condition of a site in that respect. I could focus my attentions on inline versus block elements, or I could differentiate between different levels of headings.

More recently, I received comments on that post from a visitor who thought my idea to change this was a good one — so, at long last, I’ve gotten around to doing it.

[I unpublished this tool in 2013. Sorry!]

And here’s an example of output:

The graph pictured here is for Metrolinx, the Greater Toronto Transit Authority — and Joe Clark’s failed redesign of the year. It makes for a pretty interesting case study. I know the output is small; but bear with me.

In this graph, you can clearly see long strings of orange nodes, which indicate nested table elements. You can also see significant clusters of bright red nodes, indicating deprecated tags. Altogether, the site is a maze of primarily long wavelength colors. In general, in the color scheme I’ve set up, greater densities of long wavelength colors (red, orange, pink) shows a dependence on tables for layout and presentational elements. Short wavelength color (blue, green) indicate more semantically meaningful structures.

I made a number of small changes to the script which I think add value. First, I added the ability to change the root node you’re mapping. I don’t know that this is incredibly valuable, but it does provide an interesting alternate piece of information. The node switching is limited; it will only check the first node specified of that particular content type.

The second change is to provide a variety of color schemes. The default is pretty complicated, although I drew the line well before attempting to provide a subtly different shade for every single element. I hope that the colors provided at least give you an idea of what you’re looking at, however. The alternate color schemes (two, at the moment) are much simpler: one which simply differentiates between allowed and deprecated elements and another which highlights all inline elements (a, dfn, samp, etc.).

Now, I’ve never programmed in Java before, and although the changes I made to Sala’s source code are relatively slight, it’s highly probable that there are bugs; and I’ve certainly not managed to remove any bugs from the original code.

The last thing I need to mention is concerning the accessibility of this applet. It’s just not accessible. In fact, I know little about how to make Java accessible in the first place; but even so, the entire concept of this applet is highly dependent on color. There can be no question that if you are color-blind or otherwise sight impaired this will be a problem. Additionally, there is absolutely no means present for any screen-reader to understand the input. I do hope to change this at a later date, and author a text-based output which will provide a separate, accessible interface with the information, but that just hasn’t happened yet.

Also worth looking at:

  • Validation Graphs – a stand-alone Java application also based on the html graph script which spiders pages and checks them for validity.
  • Web2DNA – same basic idea, different implementation.