Blog ยป Archive by category 'Web Development'
Following the publication of a brief article on Search results design by Adaptive Path, I decided that revising my database search script was a valuable goal. Specifically, meeting the checklist in that result was probably not a bad idea!
It’s not that the previous version was terrible, but I knew perfectly well that it could be much better.
The additions to the script are pretty straightforward:
Additions:
- Added: Made row highlighting available in both tabular and list-based search results.
- Added: Search terms are now highlighted in search results.
- Added: The default sort is now to order results by query relevance.
- Added: Paginated navigation of search results is now available.
- Added: Translation base file [English], so translating the script is easier.
- Added: Basic Spellchecking [English]
- Added: Default stylesheet
Changes:
- Changed: Text excerpts are now truncated at word boundaries, rather than in the middle of words.
- Changed: separated results template information into external include files for easier upgrading or modification.
- Changed: Included the search form as part of the script so that search terms could be automatically returned to the search input.
The spell checking is the most exciting addition in my view. It’s hardly complete, but it’s based on a list of 4,068 common misspellings available from Wikipedia. This addition has significantly bulked up the total download size, since I’m including the spell-checking database as part of the download, but I think it adds a lot of value to the script.
I’ve also added a translation base file to the package, to make it a bit easier for users of the script to port it to their own languages. Unfortunately, I haven’t yet had time to seriously work on the internationalization of the search script itself, so (to be entirely frank) this is an area which the script isn’t really well suited at this time.
Internationalization is next on the list, however. It’s a high priority at this point, since internationalization ranks as one of the most reported problems with the script.
With spell-checking in mind, I think it’s appropriate to provide a healthy reminder of the limitations of spellcheck:
Candidate for a Pullet Surprise
by Mark Eckman and Jerrold H. Zar
I have a spelling checker,
It came with my PC.
It plane lee marks four my revue
Miss steaks aye can knot sea.
Eye ran this poem threw it,
Your sure reel glad two no.
Its vary polished in it’s weigh.
My checker tolled me sew.
A checker is a bless sing,
It freeze yew lodes of thyme.
It helps me right awl stiles two reed,
And aides me when eye rime.
Each frays come posed up on my screen
Eye trussed too bee a joule.
The checker pours o’er every word
To cheque sum spelling rule.
Bee fore a veiling checker’s
Hour spelling mite decline,
And if we’re lacks oar have a laps,
We wood bee maid too wine.
Butt now bee cause my spelling
Is checked with such grate flare,
Their are know fault’s with in my cite,
Of nun eye am a wear.
Now spelling does knot phase me,
It does knot bring a tier.
My pay purrs awl due glad den
With wrapped word’s fare as hear.
To rite with care is quite a feet
Of witch won should bee proud,
And wee mussed dew the best wee can,
Sew flaw’s are knot aloud.
Sow ewe can sea why aye dew prays
Such soft wear four pea seas,
And why eye brake in two averse
Buy righting want too pleas.
The whole world of spam is an accessibility nightmare. The concept behind web accessibility is to ensure that users can access the complete functionality of your web site — but how do you cope with the fact that spambots will happily take advantage of any hole you leave?
Comment forms, contact pages, email addresses and enrollment forms. All methods of giving critical access to previously unidentified users — and all in positions where you just need to find that crucial differentiation between real people and robots.
When you’re talking about functionality which is locked behind a log-in form, there’s not really a huge amount of trouble in defining the security/accessibility conundrum. Require a good, secure password and you’re pretty safe. People with disabilities, for the most part, can use a password field just as effectively as anybody else. Once you’re behind that iron curtain, you can usually stop worrying about the distinction: everybody who has access to your private functionality is a known user. They’ve identified themselves, provided credentials which grant them a certain degree of access, and you can stop worrying about them.
But your front door can be a big problem.
You need to create a doorway which will allow visitors you don’t already know to reach you. They need to be able to contact you in order to initiate business, or enroll in your program, or at least create an account with your site. It’s therefore absolutely critical that you create a form which can be accessed by anybody.
But you still only want people using your form. Robot visitors rarely pay the enrollment fee, so they’re not exactly welcome visitors in every area of your site. You certainly don’t want to be thanking them for contacting you with an offer to enlarge your anatomy!
Spam protection and accessibility have inherent conflicts of interest: the formar goal attempts to prevent a form from being used, the latter promotes it. The two goals aren’t actually antipathetic of each other, but getting the two goals to work collaboratively does require a detailed understanding of what the issues are.
Stopping the Robots
One of the most common solutions to the spam problem is to prevent a problem which a computer can’t solve. The most obvious solutions (pictures of animals, pictures of people, etc.) are inherently flawed because they require specific pieces of information in order to solve. They’ll require correct spelling in the correct language with knowledge of the subject depicted. Although most visitors may be able to identify an elephant, some visitors will inevitably (and correctly_ identify it as an elefant.
Presumed knowledge is a barrier to both humans and computers.
This is what has led to the numerous garishly blurred and colored text images you’ve undoubtedly had to interpret. Computers can use character recognition to examine images and identify the text, so the presentation is warped to decrease the likelihood of recognition. Of course, this also decreases the likelihood that humans will be able to read the image. Humans with disabilities? No chance. Either you include an alt attribute, making the solution trivial for a computer, or you leave it out — making the solution impossible for somebody with a visual disability.
Thus was born the audio CAPTCHA. However, audio CAPTCHA requires specific technology — an audio format must be chosen, and an audio player provided. Additionally, computers are capable of recognizing audio excerpts in much the same way they can recognize images. As a result, the audio output is distorted. I’ve listened to audio CAPTCHAs, and all I can say is that I hope others have better luck than I do. I’ve never passed one.
And, of course, neither of these methods will provide access for anybody who is both hearing and visually impaired.
There are numerous other examples of attempts at accessible CAPTCHAs. Most of them depend on the fact that while robots may be text-aware, they are not necessarily capable of following instructions provided in text. Simple question & answer bot-blocking techniques like:
- Write “human” in the field below.
- What is 3 + 4?
- Is fire hot or cold?
These simple questions can slow spam — these can be considered generic spam prevention methods. They will stop almost all spam which is not specifically targeted at the form. However, if any programmer decides that they want to write a bot to attack your site, it is a trivial problem. Simply put, these kinds of questions generate security through obscurity.
A second class of bot-blocking techniques are found in more complex question & answer sets:
- Write “red” in the 2nd text field on the left.
- Enter your name in the 3rd row, 2nd column.
These programmatically variable questions may also slow a bot, but can also be incredibly challenging — if not impossible — for a human visitor who is not using an visual browser with an output equivalent to the instructions.
Tricking the Robots
Now, robots aren’t terribly intelligent. Usually, their decision making skills are fairly limited. As such, it’s not terribly difficult to simply deceive them. These methods may have some effectiveness at slowing down bots:
- Required selections on option menus. Not that a specific option is required — just anything available in the menu.
- Honeypots — fields which should not be filled in, but probably will be by your average bot in it’s quest to cover all it’s options.
- Limited length fields — if you set this client-side, using the HTML maxlength attribute, a bot can easily limit it’s own input. However, if you set it server-side (at a safe margin for real users) you can stop a few bots which get over-eager.
Mike Cherim has valuable tips on these techniques in his article Protecting Forms from Spam ‘Bots, so I’m not going to elaborate on these points excessively. Again, however, these are all valuable methods within the “security through obscurity” school of protection — no serious protection against a motivated spammer.
Mike’s secure and accessible contact form makes use of a wide variety of techniques and provides thorough accessibility, so if you’re looking for a simple contact form which will block generic spam, it’s a great option.
Behavior Detection
This is a complicated area, which I’m not going to delve into in any significant detail. Primarily because I’m not really qualified. However, it’s an important category of spam control, so it’s worth an overview.
The principle of behavior detection is based on one core observation: bots don’t behave like people. People are, for the most part, a complex blend of random behavior and systematic exploration. Bots are generally much more absolute. When you observe a web site “user” visit every single navigable page of your site at 30 second intervals, that user is clearly not human.
Although the actual interpretation is significantly more complicated, the challenge is simple: look for patterns. If a user’s time on a site matches a mathematical pattern, that’s a signal. The Bad Behavior package works (at least partially) on this general logic: search for indications about the user or user-agent and identify signals which suggest non-human activity.
Requiring Specific Capabilities
Some spam solutions make the choice that they will require specific capabilities from the visitor in order to allow them to make contact. The Wordpress comment spam plugin WP-Spamfree takes this strategy. The first layer of protection for this plugin is to require that any visitor trying to submit a comment have support for Javascript and for cookies enabled.
Immediately, this strategy eliminates the vast majority of bots — and a small minority of humans.
Conclusion
I’m not aware that there’s any solution which has 100% success at differentiating humans from bots. Any barrier put in place to spam will also create a barrier for somebody. However, this is a decision that must be made for any site: when you’re receiving thousands of spam messages a day through an insecure contact form, is it better to stop the occasional human or massively reduce your daily spam-killing time commitment?
Ultimately, there isn’t a real answer. Spam is too great of an issue to simply ignore. However, any time you create a CAPTCHA — of any sort — just remember this: provide an alternative. If you provide a phone number to those who have failed your little test, they may be able to reach you. If somebody needs to reach you, make it possible: even if they’ll have to write you a letter in order to post a comment on your blog.
This new book from Packt Publishing & Nirav Mehta is a quick and effective introduction to developing websites specifically targeted at mobile device users. I say “users” for a reason — - one of the strongest advantages to the book is a strong focus on considering your user and their needs as a key element of mobile web development.
My overall reaction to this book was positive. It covers a wide variety of key issues for mobile web programming in an easily understood manner. The book is targeted primarily at developers who already have some experience at web development and design, so it doesn’t delve into any serious detail when it comes to server-side programming or HTML coding, but instead makes a point of emphasizing places where the mobile web is different from internet interaction on a desktop device.
Mehta goes out of his way on many occasions to emphasize the serious importance of considering who (and what!) will be using your mobile web application.
“Any website accessed from a mobile device is mobile web — - whether it’s been tailored to work on a mobile or not!” Mobile Web Development, Nirav Mehta, page 10
The book covers a wide range of issues — - from developing for mobile devices using a “lowest common denominator” plan to implementing highly dynamic mobile applications which adapt automatically to the device currently in use. The text is easy to understand and follows a logical progression, starting with the mobile web development practices which are most similar to the development of standard web applications before moving into the areas which are very specifically targeted towards mobile devices.
This isn’t to say that the book doesn’t have a few flaws. I identified three areas where I really would have liked to seen better work.
Editing
In general, the copy editing on this text was pretty poor. The editing improved as I got further into the book (or I became more oblivious to it), but the introductory chapters had a lot of problems. There weren’t a lot of typos — - but the grammar was noticeably lacking. The book is rife with sentences like this:
“We will need a recharge of patience if we wanted to watch a movie preview on low speed mobile networks.”
I’m not a member of the grammar police, but I’m certainly sympathetic. Professionally published books simply shouldn’t contain the kinds of errors found in this book.
Code Examples
The author talks about following web standards as a critical element of mobile web development. That’s great. It is, however, a serious pet peeve of mine to see code examples which don’t reflect the text of the book. The very first code example in the book is this:
<link rel="stylesheet" type="text/css" media="handheld" href="mobile.css">
The text preceding it states “Here’s how you can add an alternative stylesheet link in your XHTML page.” I see a problem here. Yes, the author does explain at a later point in the book that all XHTML elements must be closed: but it’s a simple fact of life that most people referencing this book will be far more likely to simply reference the code as is. This is simply a mistake; but it’s not one that should have made it through a review of the book.
I’ll admit that I haven’t gone through and checked the validation of every code example. Most of them seemed solid and accurate. There are definitely examples which wouldn’t be valid under the XHTML DocType, but I’m not adept enough with XHTML-MP to know off-hand if the same is true within the mobile profile DocType.
Appendices
Simply put, there aren’t any. There were numerous points in the book where I thought to myself that an appendix would be great. A list of resources cited by topic, a section summarizing the syntax of VXML, tables showing the differences between XHTML and XHTML-MP or between CSS and WCSS. These kinds of resources would have been tremendous benefits to the overall reference value of the text.
Overall
This is a worthwhile book. Even though I wouldn’t recommend trusting the code examples, the truth is that you should never simply take code examples as written — - you learn best by taking an example and re-purposing it for your own needs. Mobile Web Development will introduce you to the key issues for mobile web programming and design in a manner which can give you a quick start on mobile web application development.
It didn’t quite take 2 years, at least. But very, very close. The last release of this script was May 25th, 2006 — - so I made it just under the wire. But this is also a bit more than just a script update. In fact, this is a complete overhaul. I’m not certain that there’s actually a single line in the poll script which is the same as the previous version.
This was necessary; because the previous version was, in a word, pretty lousy. It may have acted as a decent jumping off point for some beginning programmers to code their own; but, on the whole, it was not a sophisticated script, and it was nothing like developer-friendly.
This new version, leveraging the power of Google’s Chart API and some clever scripting by Christian Heilmann provides a better end result with fewer potential problems for the user.
I’ve vastly increased the flexibility of the script — - which once could only support a fixed 2 — 5 options in a single question — - to provide support for any number of questions with any number of options, customizable at the question level. I don’t anticipate that anybody will be authoring 100 question polls with this…but they could, in theory.
And, to cap it all off, I’ve added an administrative interface which allows users to perform most of their basic management needs without needing to crawl into the database. Hopefully, it won’t prove to be too buggy.
Are there likely to be bugs in this? Yes! So, if you download this and try it out, please let me know what you notice. I caught quite a few; but I think it’s safe to say that there are a few left in there.
And by “a few,” I mean “actually, there could be a lot of ‘em.”
Check it out or download the package now.
Here’s the first clue: it’s not creating a pixel-perfect replication of your ideal version of a site in all browsers.
In fact, cross-browser compatibility ultimately has very little to do with what a web site looks like, and a lot more to do with how it functions. It also has relatively little to do with browsers, and perhaps could better be explained as multiple user-agent compatibility.
“Compatibility” (in this context) is not a term which means “looks and behaves identically” — - instead, it may be better described as “performs equivalently under alternative conditions.” But developers and designers tend to most immediately seize upon appearance as the guiding line for cross-browser compatibility.
Of course, let’s be honest: there are a lot of very good reasons for this. Completely disregarding what we may know about the behavior of a site, clients tend to be very visually oriented. They POP their new site open at home one day during development and notice a whole variety of differences which they’re suddenly concerned about. If you’re lucky, they’re opening up Internet Explorer 6 after you’ve gone through the painstaking process of correct its inability to cope with standards-compliant code, rather than before you’ve gotten around to it. That can be awkward…
Another good reason is that despite what I’ve stated above, making the design behave more-or-less identically between different browsers is actually quite desirable. From a usability perspective, a seamless change in interactivity between different user-agents is very desirable. If you’ve ever tried to guide somebody through using a website which delivers a different experience to their browser than to yours, you are intimately familiar with one reason it’s a very bad idea.
But the absolute key to cross-browser compatibility is simply functionality. A lack of cross-browser compatibility doesn’t mean that something looks different; it means that it doesn’t work.
And a good thing, too. Otherwise, compatibility would be pretty well impossible between desktop browsers and mobile browsers.
With web design, it’s occasionally entirely possible to make two browsers render a design exactly the same…if you assume certain factors will remain constant, such as the user settings described in my last post. If any of those have been changed, everything pretty well goes out the window. As desirable as it is to make your designs look as similar as possible between the various desktop browsers, it always has to be acknowledged that there are limits.
There’s nothing at all that you can do to actually guarantee the same view for everybody; instead, you need to guarantee an equivalent view for everybody. Equivalent in that they will be able to get the same information and use the functions of the site to perform the same actions.
I received an interesting comment from my contact form the other day. I don’t need to respond to it, as the sender left a clearly false email address as their response address, but I do feel that it poses an interesting question for me.
This is the message in its entirety:
Dude how come your example poll have this amazing format, cool graphics, and when I download and installed yours, looks like SHIT?
I mean, thanks for doing this and all that, but come on, you are showing a FALSE example on your web site.
The example poll referred to is this: an example installation of a free MySQL/PHP polling script available on my website. Now, I find it hard to believe that anybody actually thinks of that example as having “this amazing format, cool graphics,” but that’s not really the point: the question is what a “free script” should be expected to include.
This person obviously expected a fully-realized, designed installation. What I provided was nothing but raw HTML and PHP scripting. No “out of the box” styling at all. This is what I generally desire out of a script: if it has a few hooks for CSS and semantic code, that’s great!
I can certainly agree that if you want something you offer to really become a major player in the world of popular downloads, you’ll need to put in a fair amount of work in providing easy template styling, etc. But I hardly think that should be expected for a simple web site add-on.
It leaves me curious: when you download a script, what do you expect of it?
Without both, it’s very difficult to have a successful online business. Unusable web sites have an incredible ability to generate a lack of trust in the business — - as soon as one feature fails to work correctly, or doesn’t behave as you expect, there’s an immediate connection made:
“If they can’t get this right, what else might they have problems with?”
Will they lose your financial data? Will they ship you the right product? Will they bill you the right amount of shipping? What are they going to do with your private information?
It’s hard to fully trust a website which gets in your way when you’re trying to perform basic tasks. The above questions may come up as reactions to pretty severe site problems, such as incorrect product data or frightening error messages, such as this one:
“You cannot do that. This action is being recorded.”
Yikes! Not really an ideal situation. Now, having written error messages before, I can imagine what was meant, which might be better stated like this:
“You may not perform that action. We have logged the error and will work to take care of any problems!”
There are a couple of important differences between those statements.
First, there’s the tense of the statement: we are currently recording vs. we have recorded. The first leaves an ongoing implication that your actions are being monitored which may be a bit disturbing.
Second, we have the indication of what has been recorded. In the first case, it sounds like the system is recording your actions. The second message clearly states that the information recorded was the error which occurred, and assures you that the problem will be worked on.
Maintaining trust in your application depends on good data, clear and non-threatening error messages, and clear task pathways. If your task paths aren’t clear, you may lose users due to sheer confusion. If you aren’t checking your data and perfecting your error messages (and all other responses, of course!) you may lose the visitors trust that you’ve really got their needs in mind.
Return to Top