The whole world of spam is an accessibility nightmare. The concept behind web accessibility is to ensure that users can access the complete functionality of your web site — but how do you cope with the fact that spambots will happily take advantage of any hole you leave?
Comment forms, contact pages, email addresses and enrollment forms. All methods of giving critical access to previously unidentified users — and all in positions where you just need to find that crucial differentiation between real people and robots.
When you’re talking about functionality which is locked behind a log-in form, there’s not really a huge amount of trouble in defining the security/accessibility conundrum. Require a good, secure password and you’re pretty safe. People with disabilities, for the most part, can use a password field just as effectively as anybody else. Once you’re behind that iron curtain, you can usually stop worrying about the distinction: everybody who has access to your private functionality is a known user. They’ve identified themselves, provided credentials which grant them a certain degree of access, and you can stop worrying about them.
But your front door can be a big problem.
You need to create a doorway which will allow visitors you don’t already know to reach you. They need to be able to contact you in order to initiate business, or enroll in your program, or at least create an account with your site. It’s therefore absolutely critical that you create a form which can be accessed by anybody.
But you still only want people using your form. Robot visitors rarely pay the enrollment fee, so they’re not exactly welcome visitors in every area of your site. You certainly don’t want to be thanking them for contacting you with an offer to enlarge your anatomy!
Spam protection and accessibility have inherent conflicts of interest: the formar goal attempts to prevent a form from being used, the latter promotes it. The two goals aren’t actually antipathetic of each other, but getting the two goals to work collaboratively does require a detailed understanding of what the issues are.
Stopping the Robots
One of the most common solutions to the spam problem is to prevent a problem which a computer can’t solve. The most obvious solutions (pictures of animals, pictures of people, etc.) are inherently flawed because they require specific pieces of information in order to solve. They’ll require correct spelling in the correct language with knowledge of the subject depicted. Although most visitors may be able to identify an elephant, some visitors will inevitably (and correctly_ identify it as an elefant.
Presumed knowledge is a barrier to both humans and computers.
This is what has led to the numerous garishly blurred and colored text images you’ve undoubtedly had to interpret. Computers can use character recognition to examine images and identify the text, so the presentation is warped to decrease the likelihood of recognition. Of course, this also decreases the likelihood that humans will be able to read the image. Humans with disabilities? No chance. Either you include an
alt attribute, making the solution trivial for a computer, or you leave it out — making the solution impossible for somebody with a visual disability.
Thus was born the audio CAPTCHA. However, audio CAPTCHA requires specific technology — an audio format must be chosen, and an audio player provided. Additionally, computers are capable of recognizing audio excerpts in much the same way they can recognize images. As a result, the audio output is distorted. I’ve listened to audio CAPTCHAs, and all I can say is that I hope others have better luck than I do. I’ve never passed one.
And, of course, neither of these methods will provide access for anybody who is both hearing and visually impaired.
There are numerous other examples of attempts at accessible CAPTCHAs. Most of them depend on the fact that while robots may be text-aware, they are not necessarily capable of following instructions provided in text. Simple question & answer bot-blocking techniques like:
- Write “human” in the field below.
- What is 3 + 4?
- Is fire hot or cold?
These simple questions can slow spam — these can be considered generic spam prevention methods. They will stop almost all spam which is not specifically targeted at the form. However, if any programmer decides that they want to write a bot to attack your site, it is a trivial problem. Simply put, these kinds of questions generate security through obscurity.
A second class of bot-blocking techniques are found in more complex question & answer sets:
- Write “red” in the 2nd text field on the left.
- Enter your name in the 3rd row, 2nd column.
These programmatically variable questions may also slow a bot, but can also be incredibly challenging — if not impossible — for a human visitor who is not using an visual browser with an output matching the instructions, whether because they’re using a responsive site on a mobile device or a screen reader where “left” has no meaning.
Tricking the Robots
Now, robots aren’t terribly intelligent. Usually, their decision making skills are fairly limited. As such, it’s not terribly difficult to simply deceive them. These methods may have some effectiveness at slowing down bots:
- Required selections on option menus. Not that a specific option is required — just anything available in the menu.
- Honeypots — fields which should not be filled in, but probably will be by your average bot in it’s quest to cover all it’s options.
- Limited length fields — if you set this client-side, using the HTML maxlength attribute, a bot can easily limit it’s own input. However, if you set it server-side (at a safe margin for real users) you can stop a few bots which get over-eager.
Mike Cherim has valuable tips on these techniques in his article Protecting Forms from Spam ‘Bots, so I’m not going to elaborate on these points excessively. Again, however, these are all valuable methods within the “security through obscurity” school of protection — no serious protection against a motivated spammer.
This is a complicated area, which I’m not going to delve into in any significant detail. Primarily because I’m not really qualified. However, it’s an important category of spam control, so it’s worth an overview.
The principle of behavior detection is based on one core observation: bots don’t behave like people. People are, for the most part, a complex blend of random behavior and systematic exploration. Bots are generally much more absolute. When you observe a web site “user” visit every single navigable page of your site at 30 second intervals, that user is clearly not human.
Although the actual interpretation is significantly more complicated, the challenge is simple: look for patterns. If a user’s time on a site matches a mathematical pattern, that’s a signal. The Bad Behavior package works (at least partially) on this general logic: search for indications about the user or user-agent and identify signals which suggest non-human activity.
Requiring Specific Capabilities
Immediately, this strategy eliminates the vast majority of bots — and a small minority of humans.
I’m not aware that there’s any solution which has 100% success at differentiating humans from bots. Any barrier put in place to spam will also create a barrier for somebody. However, this is a decision that must be made for any site: when you’re receiving thousands of spam messages a day through an insecure contact form, is it better to stop the occasional human or massively reduce your daily spam-killing time commitment?
Ultimately, there isn’t a real answer. Spam is too great of an issue to simply ignore. However, any time you create a CAPTCHA — of any sort — just remember this: provide an alternative. If you provide a phone number to those who have failed your little test, they may be able to reach you. If somebody needs to reach you, make it possible: even if they’ll have to write you a letter in order to post a comment on your blog.
You always need to ask yourself: who should shoulder the burden? Is it your responsibility, or your visitor’s?