Spam vs. Accessibility

The whole world of spam is an accessibility nightmare. The concept behind web accessibility is to ensure that users can access the complete functionality of your web site — but how do you cope with the fact that spambots will happily take advantage of any hole you leave?

Comment forms, contact pages, email addresses and enrollment forms. All methods of giving critical access to previously unidentified users — and all in positions where you just need to find that crucial differentiation between real people and robots.

When you’re talking about functionality which is locked behind a log-in form, there’s not really a huge amount of trouble in defining the security/accessibility conundrum. Require a good, secure password and you’re pretty safe. People with disabilities, for the most part, can use a password field just as effectively as anybody else. Once you’re behind that iron curtain, you can usually stop worrying about the distinction: everybody who has access to your private functionality is a known user. They’ve identified themselves, provided credentials which grant them a certain degree of access, and you can stop worrying about them.

But your front door can be a big problem.

You need to create a doorway which will allow visitors you don’t already know to reach you. They need to be able to contact you in order to initiate business, or enroll in your program, or at least create an account with your site. It’s therefore absolutely critical that you create a form which can be accessed by anybody.

But you still only want people using your form. Robot visitors rarely pay the enrollment fee, so they’re not exactly welcome visitors in every area of your site. You certainly don’t want to be thanking them for contacting you with an offer to enlarge your anatomy!

Spam protection and accessibility have inherent conflicts of interest: the formar goal attempts to prevent a form from being used, the latter promotes it. The two goals aren’t actually antipathetic of each other, but getting the two goals to work collaboratively does require a detailed understanding of what the issues are.

Stopping the Robots

One of the most common solutions to the spam problem is to prevent a problem which a computer can’t solve. The most obvious solutions (pictures of animals, pictures of people, etc.) are inherently flawed because they require specific pieces of information in order to solve. They’ll require correct spelling in the correct language with knowledge of the subject depicted. Although most visitors may be able to identify an elephant, some visitors will inevitably (and correctly_ identify it as an elefant.

Presumed knowledge is a barrier to both humans and computers.

This is what has led to the numerous garishly blurred and colored text images you’ve undoubtedly had to interpret. Computers can use character recognition to examine images and identify the text, so the presentation is warped to decrease the likelihood of recognition. Of course, this also decreases the likelihood that humans will be able to read the image. Humans with disabilities? No chance. Either you include an alt attribute, making the solution trivial for a computer, or you leave it out — making the solution impossible for somebody with a visual disability.

Thus was born the audio CAPTCHA. However, audio CAPTCHA requires specific technology — an audio format must be chosen, and an audio player provided. Additionally, computers are capable of recognizing audio excerpts in much the same way they can recognize images. As a result, the audio output is distorted. I’ve listened to audio CAPTCHAs, and all I can say is that I hope others have better luck than I do. I’ve never passed one.

And, of course, neither of these methods will provide access for anybody who is both hearing and visually impaired.

There are numerous other examples of attempts at accessible CAPTCHAs. Most of them depend on the fact that while robots may be text-aware, they are not necessarily capable of following instructions provided in text. Simple question & answer bot-blocking techniques like:

  1. Write “human” in the field below.
  2. What is 3 + 4?
  3. Is fire hot or cold?

These simple questions can slow spam — these can be considered generic spam prevention methods. They will stop almost all spam which is not specifically targeted at the form. However, if any programmer decides that they want to write a bot to attack your site, it is a trivial problem. Simply put, these kinds of questions generate security through obscurity.

A second class of bot-blocking techniques are found in more complex question & answer sets:

  1. Write “red” in the 2nd text field on the left.
  2. Enter your name in the 3rd row, 2nd column.

These programmatically variable questions may also slow a bot, but can also be incredibly challenging — if not impossible — for a human visitor who is not using an visual browser with an output equivalent to the instructions.

Tricking the Robots

Now, robots aren’t terribly intelligent. Usually, their decision making skills are fairly limited. As such, it’s not terribly difficult to simply deceive them. These methods may have some effectiveness at slowing down bots:

  1. Required selections on option menus. Not that a specific option is required — just anything available in the menu.
  2. Honeypots — fields which should not be filled in, but probably will be by your average bot in it’s quest to cover all it’s options.
  3. Limited length fields — if you set this client-side, using the HTML maxlength attribute, a bot can easily limit it’s own input. However, if you set it server-side (at a safe margin for real users) you can stop a few bots which get over-eager.

Mike Cherim has valuable tips on these techniques in his article Protecting Forms from Spam ‘Bots, so I’m not going to elaborate on these points excessively. Again, however, these are all valuable methods within the “security through obscurity” school of protection — no serious protection against a motivated spammer.

Mike’s secure and accessible contact form makes use of a wide variety of techniques and provides thorough accessibility, so if you’re looking for a simple contact form which will block generic spam, it’s a great option.

Behavior Detection

This is a complicated area, which I’m not going to delve into in any significant detail. Primarily because I’m not really qualified. However, it’s an important category of spam control, so it’s worth an overview.

The principle of behavior detection is based on one core observation: bots don’t behave like people. People are, for the most part, a complex blend of random behavior and systematic exploration. Bots are generally much more absolute. When you observe a web site “user” visit every single navigable page of your site at 30 second intervals, that user is clearly not human.

Although the actual interpretation is significantly more complicated, the challenge is simple: look for patterns. If a user’s time on a site matches a mathematical pattern, that’s a signal. The Bad Behavior package works (at least partially) on this general logic: search for indications about the user or user-agent and identify signals which suggest non-human activity.

Requiring Specific Capabilities

Some spam solutions make the choice that they will require specific capabilities from the visitor in order to allow them to make contact. The WordPress comment spam plugin WP-Spamfree takes this strategy. The first layer of protection for this plugin is to require that any visitor trying to submit a comment have support for Javascript and for cookies enabled.

Immediately, this strategy eliminates the vast majority of bots — and a small minority of humans.

Conclusion

I’m not aware that there’s any solution which has 100% success at differentiating humans from bots. Any barrier put in place to spam will also create a barrier for somebody. However, this is a decision that must be made for any site: when you’re receiving thousands of spam messages a day through an insecure contact form, is it better to stop the occasional human or massively reduce your daily spam-killing time commitment?

Ultimately, there isn’t a real answer. Spam is too great of an issue to simply ignore. However, any time you create a CAPTCHA — of any sort — just remember this: provide an alternative. If you provide a phone number to those who have failed your little test, they may be able to reach you. If somebody needs to reach you, make it possible: even if they’ll have to write you a letter in order to post a comment on your blog.

A useful CAPTCHA from reCAPTCHA

Just wanted to add the comment, since I didn’t specify it explicitly, that I’m not trying to claim that the accessibility of this particularly CAPTCHA is all that fantastic — it’s pretty good, but there are serious problems. I’m just saying that it’s a neat idea. ;)

In case you don’t already know, “CAPTCHA” is an abbreviation for “Completely Automated Turing Test To Tell Computers and Humans Apart.” From an accessibility perspective, they tend to have significant problems — and I’m not going to try and claim that this one is perfect. However, it is very thoughtfully done, and has a very interesting additional feature which I appreciated.

I ran across this via Stumbleupon. Unusually, rather than finding it because I was busily stumbling around, I actually became aware of it because I was trying to create a new account. The interesting CAPTCHA is called “reCAPTCHA.”

Specifically, the concept behind it (explained thoroughly on the reCAPTCHA site) is to gain value from user input in CAPTCHA texts.

Most spam protection systems are based on nonsense words, random strings of letters, or obscured text. Anything, fundamentally, which might be difficult for a computer to identify.

What the folks at reCAPTCHA observed was that scanning old books provides a wealth of resources in the realm of obscured text which can’t easily be understood by computers. To solve this problem, they pasted together the needs of a CAPTCHA and their scanning process to create a service which helps them identify these unknown texts.

Obviously, there’s an immediate problem: if the computer has already failed to identify the text, how do you test whether a human has read it correctly? Simply speaking, you don’t.

Instead, reCAPTCHA provides two words for the user: one they know, and one they don’t. The known word is the Turing test — the unknown word creates a source for the computer to identify the word they didn’t know.

From reCAPTCHA.com:

About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into “reading” books.

[…]

But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

The CAPTCHA itself is delivered via Javascript or iFrame. When Javascript is unavailable, a perfectly usable fallback is provided. reCAPTCHA also provides an audio alternative — which, I’ll confess, I found very difficult. I’d need to see some kind of user test results, however, to really know how difficult the audio version is overall. In general, as CAPTCHA technology goes, this is an admirable project. Not only because they have taken a reasonably conscientious path in preparing the interface, but simply because it’s a very good idea.

It’s unlikely I’ll implement it, I’ll confess. The fact that it’s delivered via an iFrame and the simple nature of a CAPTCHA go against my generally preferences in web development. However, should I be in a situation where I need to implement one — this will certainly be a strong candidate! (And even stronger if they fix their accessibility issues.)

More Information

Page 1 of 11

Return to Top