Spam vs. Accessibility

June 24, 2008

The whole world of spam is an accessibility nightmare. The concept behind web accessibility is to ensure that users can access the complete functionality of your web site — but how do you cope with the fact that spambots will happily take advantage of any hole you leave?

Comment forms, contact pages, email addresses and enrollment forms. All methods of giving critical access to previously unidentified users — and all in positions where you just need to find that crucial differentiation between real people and robots.

When you’re talking about functionality which is locked behind a log-in form, there’s not really a huge amount of trouble in defining the security/accessibility conundrum. Require a good, secure password and you’re pretty safe. People with disabilities, for the most part, can use a password field just as effectively as anybody else. Once you’re behind that iron curtain, you can usually stop worrying about the distinction: everybody who has access to your private functionality is a known user. They’ve identified themselves, provided credentials which grant them a certain degree of access, and you can stop worrying about them.

But your front door can be a big problem.

You need to create a doorway which will allow visitors you don’t already know to reach you. They need to be able to contact you in order to initiate business, or enroll in your program, or at least create an account with your site. It’s therefore absolutely critical that you create a form which can be accessed by anybody.

But you still only want people using your form. Robot visitors rarely pay the enrollment fee, so they’re not exactly welcome visitors in every area of your site. You certainly don’t want to be thanking them for contacting you with an offer to enlarge your anatomy!

Spam protection and accessibility have inherent conflicts of interest: the formar goal attempts to prevent a form from being used, the latter promotes it. The two goals aren’t actually antipathetic of each other, but getting the two goals to work collaboratively does require a detailed understanding of what the issues are.

Stopping the Robots

One of the most common solutions to the spam problem is to prevent a problem which a computer can’t solve. The most obvious solutions (pictures of animals, pictures of people, etc.) are inherently flawed because they require specific pieces of information in order to solve. They’ll require correct spelling in the correct language with knowledge of the subject depicted. Although most visitors may be able to identify an elephant, some visitors will inevitably (and correctly_ identify it as an elefant.

Presumed knowledge is a barrier to both humans and computers.

This is what has led to the numerous garishly blurred and colored text images you’ve undoubtedly had to interpret. Computers can use character recognition to examine images and identify the text, so the presentation is warped to decrease the likelihood of recognition. Of course, this also decreases the likelihood that humans will be able to read the image. Humans with disabilities? No chance. Either you include an alt attribute, making the solution trivial for a computer, or you leave it out — making the solution impossible for somebody with a visual disability.

Thus was born the audio CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart). However, audio CAPTCHA requires specific technology — an audio format must be chosen, and an audio player provided. Additionally, computers are capable of recognizing audio excerpts in much the same way they can recognize images. As a result, the audio output is distorted. I’ve listened to audio CAPTCHAs, and all I can say is that I hope others have better luck than I do. I’ve never passed one.

And, of course, neither of these methods will provide access for anybody who is both hearing and visually impaired.

There are numerous other examples of attempts at accessible CAPTCHAs. Most of them depend on the fact that while robots may be text-aware, they are not necessarily capable of following instructions provided in text. Simple question & answer bot-blocking techniques like:

Write “human” in the field below.
What is 3 + 4?
Is fire hot or cold?

These simple questions can slow spam — these can be considered generic spam prevention methods. They will stop almost all spam which is not specifically targeted at the form. However, if any programmer decides that they want to write a bot to attack your site, it is a trivial problem. Simply put, these kinds of questions generate security through obscurity.

A second class of bot-blocking techniques are found in more complex question & answer sets:

Write “red” in the 2nd text field on the left.
Enter your name in the 3rd row, 2nd column.

These programmatically variable questions may also slow a bot, but can also be incredibly challenging — if not impossible — for a human visitor who is not using an visual browser with an output matching the instructions, whether because they’re using a responsive site on a mobile device or a screen reader where “left” has no meaning.

Tricking the Robots

Now, robots aren’t terribly intelligent. Usually, their decision making skills are fairly limited. As such, it’s not terribly difficult to simply deceive them. These methods may have some effectiveness at slowing down bots:

Required selections on option menus. Not that a specific option is required — just anything available in the menu.
Honeypots — fields which should not be filled in, but probably will be by your average bot in it’s quest to cover all it’s options.
Limited length fields — if you set this client-side, using the HTML (HyperText Markup Language) maxlength attribute, a bot can easily limit it’s own input. However, if you set it server-side (at a safe margin for real users) you can stop a few bots which get over-eager.

Mike Cherim has valuable tips on these techniques in his article Protecting Forms from Spam ‘Bots, so I’m not going to elaborate on these points excessively. Again, however, these are all valuable methods within the “security through obscurity” school of protection — no serious protection against a motivated spammer.

Behavior Detection

This is a complicated area, which I’m not going to delve into in any significant detail. Primarily because I’m not really qualified. However, it’s an important category of spam control, so it’s worth an overview.

The principle of behavior detection is based on one core observation: bots don’t behave like people. People are, for the most part, a complex blend of random behavior and systematic exploration. Bots are generally much more absolute. When you observe a web site “user” visit every single navigable page of your site at 30 second intervals, that user is clearly not human.

Although the actual interpretation is significantly more complicated, the challenge is simple: look for patterns. If a user’s time on a site matches a mathematical pattern, that’s a signal. The Bad Behavior package works (at least partially) on this general logic: search for indications about the user or user-agent and identify signals which suggest non-human activity.

Requiring Specific Capabilities

Some spam solutions make the choice that they will require specific capabilities from the visitor in order to allow them to make contact. The WordPress comment spam plugin WP-Spamfree takes this strategy. The first layer of protection for this plugin is to require that any visitor trying to submit a comment have support for Javascript and cookies enabled.

Immediately, this strategy eliminates the vast majority of bots — and a small minority of humans.

Conclusion

I’m not aware that there’s any solution which has 100% success at differentiating humans from bots. Any barrier put in place to spam will also create a barrier for somebody. However, this is a decision that must be made for any site: when you’re receiving thousands of spam messages a day through an insecure contact form, is it better to stop the occasional human or massively reduce your daily spam-killing time commitment?

Ultimately, there isn’t a real answer. Spam is too great of an issue to simply ignore. However, any time you create a CAPTCHA — of any sort — just remember this: provide an alternative. If you provide a phone number to those who have failed your little test, they may be able to reach you. If somebody needs to reach you, make it possible: even if they’ll have to write you a letter in order to post a comment on your blog.

You always need to ask yourself: who should shoulder the burden? Is it your responsibility, or your visitor’s?

19 Comments on “Spam vs. Accessibility”

TYLER
; December 6, 2008 at 8:59 am
“CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a pain, as someone with good (but not perfect) eyesight I find them really irritating”…
http://corlive.com does a good job – it only asks about “2+2” – and somehow it works well (stops bots)
Michelle
; August 14, 2008 at 12:46 pm
I just read on web accessibility and I found it very interesting. I really love the ideas because it is very useful and helpful. Keep it up Joe.
Jamie
; July 22, 2008 at 12:26 pm
This post goes right along with your article I just read on web accessability. Great article. Great ideas I had never thought about such as making a web page read for people who are color blind.
An idea though, is that now that Firefox has a new version, you may also want to test your websites using that browser as well.
iheni
; July 17, 2008 at 2:48 am
Hi Folks, something that may be of interest and has been discussed quite a bit on various blogs over the last couple of weeks is WebVisum, http://webvisum.com/.
It is a plug in for FireFox that allows a screen reader user to add useful data to a page making tagging links, form fields images and so on. What’s really interesting is that there is a means to crack CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) in it. There’s an interview with Marc Dohnal, the initiator of WebVisum on Scripting Enabled by Christian Heilmann http://scriptingenabled.org/2008/07/interview-with-webvisumcom-crowdsourced-accessibility/.
Be interested to hearwhat people think.
Joe Dolson
; July 9, 2008 at 6:06 pm
That would be great — looking forward to seeing that addition in the future!
Michael Hampton
; July 9, 2008 at 1:58 pm
That’s actually a great idea. I’ll see if I can provide alternate contact methods defined by the administrator in a near-future version.
Richard Morton
; July 9, 2008 at 7:35 am
It amuses me every time I look at the GAWDS (Guild of Accessible Web Designers) site. In their comments forms they ask “Are you human or robot?” with an instruction to answer in one word. My answer would be “yes” but I haven’t tried that yet (who knows it might even work). Seriously though a question like that or one of the many others around could present a real difficulty to someone with a form of dyslexia or an autistic spectrum disorder and probably other learning disabilities.
Captcha is a pain, as someone with good (but not perfect) eyesight I find them really irritating but they are probably a necessary evil. Audio can help some people (but not all), a phone number or email address still ends up discriminating against disabled users because who wants to wait even one minute to be able to post a form.
Joe Dolson
; July 8, 2008 at 10:30 am
Really? Well, I’ll admit that it’s been a long time since I’ve actually seen a Bad Behavior error message. I don’t remember that being present, but I’ll certainly bow to your superior knowledge on the subject!
What I’d prefer, however, is actually being able to easily customize the contact information provided. I don’t want my email address exposed — I’d rather provide some other information which will help the user get in touch with me. Even if I did provide an email address, I wouldn’t want it to be my administrative address, since that’s an important address for me: I’d rather be able to set up an ad hoc address.
Michael Hampton
; July 7, 2008 at 8:15 pm
Strange. By default the error message provides the site administrator’s email address (as defined in Settings, General in WordPress). Are you saying that’s not sufficient?
As for the technically confusing nature of the text itself, I’ve heard that complaint before, and it’s not specific to accessibility. Technical stuff is very hard to convey clearly to a non-technical audience, especially when it’s just interrupted something they were doing. I’ll certainly accept suggestions for rewrites. 🙂
Joe Dolson
; July 7, 2008 at 7:37 pm
The problem I’ve encountered with Bad Behavior (although I absolutely agree that it’s a great package) is that while the text presented is easily rendered, it’s not always useful. It would be nice if it was more easily user-configurable (through a configuration document or, in the case of the WordPress plugin, through the administrative interface.)
The error messages describe the reason for the error, which is great, but don’t actually provide any alternative means to contact the site author. Unless you know an alternate route to the site or know the person personally, it can be very difficult to get the problem dealt with or retrieve the information you need.
If there was an addendum to the error response where the author could provide alternative contact information, this would be a helpful workaround. Of course, it couldn’t be an email address or contact form! 😉
Thanks for stopping by, Michael!
Michael Hampton
; July 7, 2008 at 7:23 pm
As the author of Bad Behavior, I can say that I’ve gone through an inordinate amount of trouble to ensure that no human is ever blocked.
This is of course impossible, since there’s going to be some oddball out there who has a very weird setup and winds up matching the profile of a spammer. For these rare people I provide an explanation of what happened and how they may be able to fix it on their own. Out of countless millions of page views over countless thousands of sites, this number comes to maybe 30 to 50 a day. Virtually all of them are able to solve the problem on their own. Those who can’t solve it on their own are directed to email the site administrator (who is then directed to email me). I get maybe one such email a week.
With respect to accessible web sites, I don’t think Bad Behavior has any particular issues. The text it presents to blocked users is strict XHTML (eXtensible HyperText Markup Language - HTML reformulated as XML (eXtensible Markup Language)) and any user agent should be able to render it in whatever manner is required.
Dean
; July 4, 2008 at 1:32 am
While I think you can limit your incoming spam, stopping it completely is hard and as soon as you think you have, someone will out smart you and work around it. I still wonder though who are these people buying viagra and stop smoking pills online, as someone must be to warrant the hundreds of messages each day I get alone.
Mike Cherim
; July 2, 2008 at 6:09 pm
Thanks Christopher.
Joe Dolson
; July 2, 2008 at 3:40 pm
This came forcibly to mind for me recently, when helping a client set up an account with Youtube. Between the two of us, we failed their CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) three times in a row before successfully creating his account.
It was truly ridiculous…
Of course, what’s REALLY reprehensible isn’t the use of CAPTCHA — it’s the systematic abuse which has caused people to find them necessary. Spam is the true villain in the equation.
Christopher M. Kelly
; July 2, 2008 at 2:31 pm
to Mike Cherim: just FYI, yes computer-to-braille devices do exist. They’re often referred to a refreshable braille displays.” They often use the same software interface as a screen reader app, like JAWS or Window Eyes, but send the information to a hardware device with little pins that move up and down creating the braille version of what the user is reading. Very expensive stuff, but mandatory for computer users who are deaf-blind. See this on wikipedia: http://en.wikipedia.org/wiki/Refreshable_Braille_display
At Joe: Good article! I’m not blind and I hate CAPTCHAs. Any technology that prohibits a person with a disability from using a site is reprehensible. Kudos on this one.
Joe Dolson
; June 28, 2008 at 9:00 am
There is no such thing as accessible to all.
Yes, and sometimes it’s necessary to say it…
Excellent article, Joe. Also, thank you for the kind promotion and link love.
Well, the fact that I’ve never received any automated spam from either of the last two versions of your contact form makes it pretty plain to me: you deserve some thanks!
Mike Cherim
; June 27, 2008 at 11:16 pm
As iheni wrote, good assessment, Joe. It illuminates a very real conundrum for which the only best “solution” will be a compromise. As you wrote, though:
And, of course, neither of these methods will provide access for anybody who is both hearing and visually impaired.
There is no such thing as accessible to all. A person who is both visually and aurally impaired is going to have one hell of a time using the web. The senses of smell and taste are out, this would leave only a device that would process the page and output physical Braille. (Does such a thing exist? Perhaps two devices — web-to-audio, audio-to-Braille — used jointly?) Sorry, I digress. I guess this just illuminates the fact web accessibility itself will always be a Swiss cheese of compromise once function, style, and interactivity are added to a static web page.
Excellent article, Joe. Also, thank you for the kind promotion and link love.
Joe Dolson
; June 25, 2008 at 8:51 am
So would I — in fact, what would really be excellent would be having an accessibility person and a security person sit down together to knock around the issues. Most of the time, neither party is really qualified to appraise the problems on the other side; but it would be a great opportunity for collaborative thinking.
Nice analogy. 😉
iheni
; June 25, 2008 at 3:56 am
Really good article Joe and a good assessment of where we are at with spam and accessibility. I’ve felt strongly for a while that security should not be put on the shoulders of the user and that CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart)’s (audio or otherwise) are like unfriendly bouncers working the door http://www.rnib.org.uk/wacblog/images/captcha-if-youre-names-not-down-youre-not-coming-in
What’s interesting to me is that we don’t hear many security people wading into the debate and I’d love to hear more from their side.