April 18, 2007

I’ve been thinking about the language of personal names recently. Commonly, personal names don’t follow the pronunciation rules of their locality. My name is “Joe.” That’s an easy one. But you can’t be sure. My partner is named “Janna” — pronounced “Yahn-uh.” In this country, it’s the very rare exception when a stranger pronounces her name correctly.

It’s embarrassing to mispronounce somebody’s name. Outside of a personal introduction, however, it can be very difficult to know how to pronounce personal names because of these numerous deviations from the rules of pronunciation. (Which, in English, are already pretty nasty.)

But using language to markup a personal name may be somewhat deceptive or inaccurate.

First, there’s the semantic content of the personal name: does the name really represent a portion of another language because of it’s pronunciation? Then, there’s the choice of language — “J” is pronounced as “Y” in several languages. When there’s a clear intent, due to family origin and known parental choice, I suppose you could choose fairly easily. However, the number of people you know well enough to select the correct language on the basis of their family origin and how their names were chosen is probably pretty small.

In fact, I think it’s fair to say that I cannot say with absolute certainty that I know how to pronounce 99% of the names I reference on the web. If I were to attempt it, I would not actually be capable of marking an appropriate language for most names.

Nonetheless, I find it frustrating to be unable to pronounce something consistently and correctly, and would certainly not mind if there was some meaningful technique to mark pronunciation. There is, of course, the Pronunciation Lexicon Specification from the Voice Browser activity group of the W3C. This is, however, not something which is applicable to XHTML or HTML to the best of my knowledge.

Anyhow, this is a question which struck me as interesting. Correct pronunciation in screen readers is one of the reasons for using language markup for a block of text. The same logic extended to personal names makes for a curious anomaly.

