"e;The transition strategy from Hypertext Mark-Up Language (HTML), as it is practiced today, to HTML based on Extensible Mark-Up Language (XML) in the future is difficult,"e; (Berners-Lee 1998) wrote Tim Berners-Lee, Director of the World Wide Web Consortium (W3C), in September 1998.
The vision of that transition brings together a number of distinct issues, from the purely computationally efficient, to the ethically and morally virtuous. There has been a good deal of excellent work on the importance of standards (Hanseth 2000, Brunsson 2002, Schmidt 1998, Grindley, P. 1995). What the concept entails, in a nutshell, is no less simple – and no less fundamental – a development as the standardisation in the industrial revolution of the length and width of nails and screws. This put an end to the bespoke smelting of every screw and nail in every machine available to industry, and made an invaluable contribution to the efficiency and productivity of every industrial endeavour (Keep 2005). The establishing of strict global standards for web coding is no less simple – and no less fundamental – an exercise for the information age.
Now, web development is in many respects not an unusual skill, in that it ranges from the hobbyist, through the cottage industry, to the blue chip and the public professional. What is unusual is that the platform upon which this range of skills is presented is the same. To make use of the dramatic analogy often used by sociologists (Goffman 1990; Butler 1993) – and indeed by the sociologists of technology, in particular actor-network theorists (Law 1992; Latour 1993) - it is as if the children’s living-room Christmas play for the grandparents, the school gym-hall drama, the amateur village hall pantomime, the touring small-scale theatre production, and the grand opera, were all to appear one after another on the same stage. To impose standards upon all of these diverse levels of skill, in order better to control and improve the quality of service provided by the stage, is a tall order. To ensure, additionally, that an induction loop is provided in the theatre for the hard of hearing, and that auditory commentaries are available for the blind, is both morally imperative, and extremely difficult to ask of those at the hobbyist end of the spectrum. Yet it is perhaps precisely because of this range of skill levels sharing the same platform, that there is also apparent, on the web, a great range of compliance with the latest standards for code languages.
Just as the British Standard Whitworth System, in fixing a set of standards for the threads of screw bolts, wrested control of the workplace from the artisans who used to handcraft each screw and bolt for each new machine, and passed that control to the capitalist entrepreneurs who could now for the first time simply buy a packet of standard screws, so too, the W3C standards for web coding wrested control of the web, in the mid-1990s, from the likes of Microsoft and Netscape, (Phillips 1998) who wanted to define HTML for their own proprietary purposes. It is noteworthy that the centralisation of control from artisans to entrepreneurs created by the Whitworth System is not replicated in this case; the centralisation implied by the transition to XML is from the entrepreneurs to a non-proprietary, non-profit-making global standards body. Yet – as the research undertaken by this project underlines - web developers – the artisans of the web – still persist in refusing to adopt the latest versions of XHTML in their practice.
The World Wide Web Consortium, established by Berners-Lee in 1994, is a non-profit-making, academic body. It is an international consortium where Member organizations, a full-time staff, and the public work together to develop Web standards. Its mission is: "e;To lead the World Wide Web to its full potential by developing protocols and guidelines that ensure long-term growth for the Web."e; (W3C 2005) In the political climate of global capitalism, however, the W3C is a cautious organisation. They publish Formal Recommendations, rather than standards. They do not engage in any direct lobbying of the industry concerning compliance. Indeed their victory over Microsoft and Netscape in the Browser Wars of the mid-late 1990s was something achieved not through open conflict between the W3C and the makers of browser software, but rather through the vigorous lobbying of external organisations such as the Web Standards Project, (WaSP) “formed in 1998 with the goal of promoting core web standards and encouraging browser makers to do the same, thereby ensuring simple, affordable access for all,” (WaSP 1998; Zeldman 2003) - and enabling web developers to avoid the increasingly necessary expense of creating multiple versions of their websites individually tailored to increasingly different browsers. Nonetheless, it is clear that the standardisation of code languages implicit in a W3C Formal Recommendation carries with it the intent of those who contributed to its making, and the W3C is an inherently non-proprietary, public sector body for whom the interests of private commercial enterprises will at best be secondary. Ultimately, indeed, a W3C Formal Recommendation, seen in this light, can only serve one master, the Director of the W3C, inventor of the web, Tim Berners-Lee.
The vision of the transition, moreover, derives from the same source. XML is at the heart of Berners-Lee’s vision for the Semantic Web (Berners-Lee 1998): his wish, through the universal application of rigorously quality processed international standards for code languages, to see machines talking to one another on our behalf. The Semantic Web of the future promises to bring us intelligent search engines able to supply paragraphs of detail in answer to our queries, plucked from websites relevant to the topic, rather than merely a list of possible web addresses where the answers might be obtained. XML, as a development from HTML, is crucial to this project, and is a great deal stricter, requiring far greater rigour from both the web developer and the browser. The imposition of code standards upon the world wide web, in pursuit of this vision, only incidentally wrests control of the future of the web from those corporations who would wish it to conform to their own proprietary needs (Phillips 1998). In short, the evolution of standards for the web, unlike the simpler example of the British Standard Whitworth screw thread, is a very heterogeneous network of very complex relations between an inventor seeking the next level of his invention, corporations seeking market dominance, and advanced web developers seeking a level playing field in the browser market to facilitate cross-browser coding.
The story of HTML is somewhat chequered. In its earliest days it was a new tool created by Tim Berners-Lee at the European Organization for Nuclear Research (CERN) laboratories in Switzerland to assist in data sharing between the computers at the centre. Based upon Standard Generalized Mark-Up Language (SGML), it was a miniature, simplified version of that highly complex language. But Berners-Lee soon had other plans for it. Taken up by the World Wide Web Consortium (W3C) – the body established by Berners-Lee in 1994 to try to marshal the phenomenal growth of the web his mark-up language had spawned – HTML was to undergo a profound reinvention. Web pages, originally merely text with the odd image added to spice things up, increasingly became, during the mid-1990s, a ‘virtual’ extension of the already mature desk-top-publishing revolution, which had seen the printing industry massively computerised over a very short period of time. HTML 3, a formal recommendation of the W3C in the mid90s, contained a wide range of new visual formatting properties, in response to the increasing interest in what could be achieved presentationally on the web.
There were essentially three main players in this online development: Netscape, Microsoft, and the W3C. While Netscape and Microsoft vied for control of the web with their own, proprietary, unwieldy new versions of HTML, and other minor players busied themselves with ever more complex and cumbersome plug-ins visitors to websites were increasingly encouraged to download and install into their browsers, the W3C began creating a new foundational language for the future of the web: Extensible Mark-up Language (XML), and a new presentational language: Cascading Style Sheets (CSS).
The W3C’s new versions of HTML, following HTML3, lifted the language from its SGML origins and shifted it across to this new, XML foundation, first through the publication of HTML 4, and then XHTML. Both these new kinds of HTML, published in the late 1990s, came in two flavours: Strict, and Transitional. The former flavour had stripped out all of the visual formatting and presentational elements introduced in HTML 3, paring the language down to a more robust version of the earlier, more structural HTML 2. Visual formatting was now to be achieved exclusively through the use of the new W3C technology, Cascading Style Sheets (CSS). The Transitional flavour of these new versions of HTML allowed web designers to continue using older, HTML 3 visual formatting code until such time as the makers of browsers had caught up, and were properly supporting the use of CSS. The Transitional DTD thus included “presentation attributes and elements that W3C expects to phase out as support for style sheets matures,” and the admonishment that, “Authors should use the Strict DTD when possible, but may use the Transitional DTD when support for presentation attributes and elements is required.” (W3C 1999) The differences between HTML 4 and XHTML1.0 were minor, constituting mainly in some more rigorous rule-based practices in the latter than in the former, geared toward making the code more XML friendly. Finally, in the summer of 2001, XHTML1.1 was published, with no Transitional version.
Steven Pemberton, Chair of HTML and Forms Working Groups at the W3C, when asked about the Transitional versions of HTML, in the course of an email correspondence with the Principal Investigator on this project during February 2005, said, “As far as I am concerned the phase-out is more or less complete.” Asked for a direct quote regarding what kind of HTML to use, he replied: "e;people should be using strict DTDs and validating against them."e;
But of course this is far from the whole story. User Agents - the browsers through which web pages are viewed – had of course to change and develop with this transition. "e;HTML browsers accept any input, correct or incorrect, and try to make something sensible of it,"e; as the W3C’s FAQ page on XHTML explains. "e;This error-correction makes browsers very hard to write, especially if all browsers are expected to do the same thing. It has also meant that huge numbers of HTML documents are incorrect, because since they display OK in the browser, the author isn't aware of the errors. This makes it incredibly difficult to write new web user agents since documents claiming to be HTML are often so poor."e; (W3C 2004) As things stand, however, at the time of writing, the browsers used by the vast majority of people worldwide are by and large XHTML compatible, and fully capable of supporting style sheets, making the continued use of a Transitional DTD quite unnecessary.
| 2008 | IE7 | IE6 | IE5 | Fx | Moz | S | O |
|---|---|---|---|---|---|---|---|
| April | 24.9% | 28.9% | 1.0% | 39.1% | 1.0% | 2.2% | 1.4% |
Of the above Browsers, support for XHTML and CSS is excellent in Internet Explorer 7 (IE7), Firefox (Fx), Mozilla (Moz), Safari (S) and in Opera (O). Internet Explorer 6 (IE6) and Internet Explorer 5 (IE5) have problems with some style
sheet positioning. Thus the overwhelming majority of people accessing W3Schools do so with an XHTML compatible browser fully supporting style sheets. Browsers are, of course, free, and the tiny percentage of users still using an older browser can easily be guided to where they can update their software. IE5 is, in any case, not so bad in its support for CSS as browsers such as Netscape 4.x, now hardly is use at all.
Parallel with the development and publication of XHTML, the W3C undertook an exercise entitled the Web Accessibility Initiative, (WAI) which in 1999 published its Web Content Accessibility Guidelines (WCAG). As part of the initiative, alongside stripping out the visual formatting from HTML, new elements and attributes were introduced into the code to help make it more accessible to disabled people. Thus HTML 4 and XHTML 1.0, published the same year, contained these elements in both Strict and Transitional flavours, as does XHTML1.1. The WAI also published, in the following years, the Authoring Tool Accessibility Guidelines (ATAC), and User Agent Accessibility Guidelines (UAAG). It is these standards for those making websites, the software tools many use to make them, and the browsers through which they are accessed, that have since 1999 been increasingly accepted by governments in numerous countries, as the de facto global standards for web accessibility. The battles between Netscape and Microsoft came to an end, and the makers of browsers now pride themselves on their support for and compliance with the standards set by the W3C.
The WCAG provide a set of guidelines for creating web pages that are accessible to all, regardless of sensory, physical, or cognitive ability. To provide web developers with a graded approach to the implementation of accessibility, three ‘levels’ have been defined: Level A, Level AA and Level AAA. Of particular note are three Guidelines included as of Level AA priority: 3.2, 11.1 and 11.2. Guideline 3.2 of the WCAG states: “Create documents that validate to published formal grammars”. Guideline 11.1 states “Use W3C technologies when they are available and appropriate for a task and use the latest versions when supported.” In a climate where nearly all browsers support the latest versions of HTML and CSS, it would seem that the WCAG are expressly recommending that this is the way webpages should be made. The fact that Guideline 11.2 states “Avoid deprecated features of W3C technologies” would suggest that it is the Strict DTD of HTML 4.01 or XHTML 1.0 that should be used, in any case, if the latest version, XHTML 1.1, is not used.
Amongst those responsible for the creation of the WCAG 2.0, there is ongoing discussion about the relationship between accessibility and validity. “People agree that validity is a good first step towards accessibility and that validity does not guarantee accessibility,” opens the summary at the W3C website. Essentially, there are those who feel that XHTML code that validates against the Document Type Definition laid down by the W3C is essential, and should be a Level A priority, and those who feel that accessibility is the highest priority, and that the recommendations may at times be behind advances in making pages accessible – in short that invalid code may at times be more accessible than valid code.
In summary validity, it can be said – at the very least - is an important part of what makes a webpage accessible. Legislation and Directives in Europe, Australia and the United States aimed at preventing discrimination against, and promoting equality of opportunity for, disabled people, have made the construction of websites in compliance with the WCAG 1.0 a legal requirement. Most governmental directives specify Level AA as the minimum requirement, and valid code is a very important part of what makes a website accessible.
In the final analysis, to return to the theatrical analogy with which we began this section, it is clear that the children’s living-room Christmas play for the grandparents will likely never reach the standards required of the Grand Opera. But the standards of professionalism set by those at the top of the profession will inevitably impact upon those below, with the inevitable implication that the onus is upon those web developers responsible for the public sector and blue chip private sector websites to improve their own standards, if the laudable vision of the semantic web is ever to be realised.