Testability Costs Too Much – A List Apart

You’d be forgiven for not recognizing the term “testability,” despite its central importance to the W3C’s new Web Content Accessibility Guidelines (WCAG 2.0). There’s little mention of testability in WCAG 2.0 documents—and given the verbosity of the guidelines, the absence of information about testability seems almost purposeful. Indeed, testability is one of WCAG 2.0’s big secrets: while most of the public complaints about WCAG 2.0 have been about technology neutrality, jargon, and the lack of attention to people with cognitive disabilities, the underlying cause behind these issues—testability—has taken a back seat.

Article Continues Below

So what is testability, and why does it matter? Before we can answer that, we need to go back to the beginning.

The World Wide Web Consortium (W3C) formed the Web Content Accessibility Guidelines Working Group in late 1998. In May 1999, the Working Group released a set of accessibility development and content guidelines, called the Web Content Accessibility Guidelines (WCAG 1.0). Almost immediately, the Working Group began working on the second version of the guidelines, WCAG 2.0. I joined the Working Group in May of 2000 as an Invited Expert and was active in the Working Group, with two notable absences, until August of 2006. Based on my experience as a member of the Working Group—and of the larger accessibility community—I believe that many of the problems associated with WCAG 2.0 can be attributed to testability.

So once again, what is testability, exactly? Although testability is mentioned in the abstract of the recent WCAG 2.0 working draft documents and expanded in the “Conformance” section, a full definition sits not in the glossary but in the Requirements for WCAG 2.0 Checklists and Techniques, dated 7 February, 2003. Within this document, you will find the only definition of testability as it applies to WCAG 2.0. Here’s that definition:

Definition: Testable: Either Machine Testable or Reliably Human Testable.

Definition: Machine Testable: There is a known algorithm (regardless of whether that algorithm is known to be implemented in tools) that will determine, with complete reliability, whether the technique has been implemented or not. Probabilistic algorithms are not sufficient.

Definition: Reliably Human Testable: The technique can be tested by human inspection and it is believed that at least 80% of knowledgeable human evaluators would agree on the conclusion. The use of probabilistic machine algorithms may facilitate the human testing process but this does not make it machine testable.

Definition: Not Reliably Testable: The technique is subject to human inspection but it is not believed that at least 80% of knowledgeable human evaluators would agree on the conclusion.

In simpler terms, the Web Content Accessibility Guidelines Working Group defines a testable success criterion as one that is:

machine-testable or
“reliably human testable”—which means that eight out of ten human testers must agree on whether the site passes or fails each success criterion.

Testability first entered the scene in 2000 as a response to criticism directed at WCAG 1.0—specifically, that some of the guidelines were being ignored because they were too broad or vague. One example is the WCAG 1.0 checkpoint, “Use clear and simple language.” The general consensus was that this checkpoint was open to interpretation and people didn’t know how to comply. Cue testability.

At first glance, testability seems not only reasonable but integral to the development of a successful WCAG 2.0—how else will developers know they have complied with an accessibility requirement? The WCAG Samurai Errata are but one example of a set of guidelines that don’t rely on testability but do give developers clear instructions on how to comply with relevant requirements. And after all, the Working Group was created to write success criteria that assist people with disabilities. Success criteria that are integral to helping people with disabilities use the web are being outlawed due to WCAG 2.0’s testability requirement; their definition as otherwise valid success criterion is not at issue. Once an insistence on testability begins outlawing otherwise useful success criteria, it needs to be reconsidered.

The problem with testability is that even the most reasonable of success criteria can be non-testable—and if a success criterion is not considered testable, it isn’t included in WCAG 2.0. Whether the criterion is an otherwise useful technique that improves accessibility is now irrelevant to whether it gets included in WCAG 2.0. Due to the testability requirement, many useful success criteria have been removed from WCAG 2.0, and others watered down.

For example, it has been argued that the accessibility specialist’s old faithful, alt attributes for images, fails the testability requirement—and the tangled logic required to make them seem testable has made the guideline weaker. I lodged the following comment on the Last Call Working Draft on Guideline 1.1.1, which reads:

For all non-text content that is used to convey information, text alternatives identify the non-text content and convey the same information. (Emphasis added).

In my comment, I argued that a machine can never test whether an alt attribute conveys the same information as an image, and that eight out of ten human testers could not agree whether the text conveys the same information. I gave the following example:

…in the Live in Victoria site (www.liveinvictoria.vic.gov.au) there is an image under the heading “Business Migrants”. When I worked on this site, several people said this image should have a null ALT attribute as it conveyed no information. Several other people suggested ALT attributes of “A couple of business migrants chatting at work” or “Guys chatting at work”.

Whereas the ALT attribute that I recommended was “There is a wealth of opportunities for Business Migrants in Victoria”.

Although I received a roundabout response from the Working Group on my comment¹, their public online comment tracker dated 12 January, 2007 proves more insightful:

With regard to 1.1 the success criteria do not require that ALT text provided by different people be the same. In fact the sufficient techniques only require that ALT text be present that can be construed to be ALT text. The requirment [sic] is for alt text to be present. Since the quality of the alt text can not be measured, there is no specific criterion for quality. (Emphasis added.)

It looks like insistence on testability has brought us back to the good old days of alt=“image”, except that we have no guidelines to point to when we tell developers that the description is wrong. To be fair, the Working Group has tried to get around this particular problem by adding a few clauses to Guideline 1.1.1, for example to allow ornamental images to have null ALT attributes. However this particular clause seems inherently untestable.

…if non-text content is pure decoration, or used only for visual formatting, or if it is not presented to users, then it is implemented such that it can be ignored by assistive technology.

What kind of assistive technology? What versions? Ignored as a default, or only if the user chooses to ignore it?

There are many instances in WCAG2 where success criteria are actually not testable—and the Working Group knows it. In Bugzilla, the Working Group’s issue tracking system, there is a tracked issue lodged by three Working Group members that reads: “In particular, the current wording [of WCAG2] does not seem testable. Words such as, “key,” “consistent,” “predictable,” “inconsistent,” and “unpredictable” are subjective.” Yet these terms have been used throughout WCAG2—there’s even an entire guideline that rests one of these subjective, non-testable terms:

Make Web pages appear and operate in predictable ways.

Where possible, the Working Group has tried to narrowly define success criteria to make them testable: success criterion 1.1.1, with four sub-sections, is equivalent to WCAG1 Checkpoint 1.1: “Provide a text equivalent for every non-text element.” But this just highlights another problem with testability—it increases the complexity of the success criteria. Because WCAG2 is technology-neutral, the guidelines have to be testable in a technology-neutral way, a situation that produces lengthy and jargon-heavy guidelines. In contrast, the WCAG Samurai Errata are an example of the type of guidelines that can be developed without the constraint of testability (and technology neutrality).

Cognitive disabilities neglected#section3

One criticism of the first version of WCAG was that most of the cognitive-disability–related checkpoints were relegated to Level AAA, a level rarely attempted. Only one checkpoint dedicated to the needs of people with cognitive disabilities was in the minimum level (Checkpoint 14.1: “Ensure language is clear and simple”). However, with the introduction of testability, this checkpoint was removed from WCAG2 in April 2004. It was this checkpoint that initially piqued my interest in testability and when it became clear that this checkpoint was being removed—not because it wasn’t a valid checkpoint, but because it simply wasn’t testable—I proposed the removal of testability. As a member (in good standing) of the W3C Web Content Accessibility Guidelines, on April 22, 2004, via teleconference, I argued that:

if you lock out guidelines [because] we can not define them in a testable manner, then we run the risk of locking out guidelines that people find useful and that increase the accessibility of content … [non-testable guidelines] should not be relegated to highest level (3) because we can not define them in a certain way … they should be defined in way that is most assistive to people with disabilities.
—http://www.w3.org/2004/04/22-wai-wcag-irc.html

At that point, the fate of testability—which wasn’t yet applied to all success criteria—was put to vote, and many people voted against its removal. During the same teleconference, the Working Group held another vote to decide whether testability should be a required characteristic of all success criteria, and I was the only person on the Working Group who voted against this change. A week later, my status as a member (in good standing) was revoked due to “non-participation.” At the next teleconference, the inclusion of testability was passed unanimously.

Many other techniques to assist people with cognitive disabilities, from error prevention to summary information, have also been deleted from WCAG 2.0 or moved to Level AAA or to the advisory techniques. In fact, as WCAG 2.0 does so little to assist people with cognitive disabilities that a formal objection was lodged (co-signed by myself) and a taskforce created to discuss the matter. Unfortunately the Working Group’s main response to the formal objection is to preface WCAG 2.0 with a statement that declares the guidelines not sufficient to assist people with cognitive disabilities. It is troubling that a set of guidelines aimed at assisting people with disabilities should entirely neglect the large number of web users with cognitive disabilities.

Still, it’s not really a surprise. The Working Group minutes are littered with various comments warning against using testability for the entirety of WCAG 2.0—something it was never originally intended for. Even the group’s chair, Gregg Vanderheiden, said “if we require a test for every checkpoint, life will be difficult in the realm of cognitive accessibility.” And that’s precisely what has come to pass.

Calling for an end to testability#section4

There are many reasons why testability was introduced and remains a tenet of WCAG 2.0. Some of these reasons may be valid and important—the W3C is seeking ISO certification, the WAI want WCAG 2.0 enshrined in law—but none should be allowed to draw attention away from the core goal of directly improving the ability of people with disabilities to access websites.

When success criteria are removed because they are not testable—even if they are otherwise valid and useful success criteria—the Working Group has lost its way, and we need to guide them back to the right path. The Working Group has made significant changes to WCAG 2.0 after the Last Call Working Draft; among other things, one of the most contentious issues, baseline, has been significantly modified. The Working Group needs to go one step further and remove testability, lest they risk alienating both developers and accessibility specialists. With the publication of the WCAG Samurai Errata, the web community finally has a choice—and if WCAG 2.0 continues to be unworkable, developers will simply turn to another set of guidelines.

The Working Group has asked for comments on their latest WCAG 2.0 Working Draft by June 29, 2007. Now is the time to call for the removal of testability.