The web was initially developed based on three architectural components: transport (which was provided by HTTP), addressing (provided by URLs) and data formats (provided by HTML). As the web grew, limitations in the original architecture became apparent. This talk gives an overview of the development of new web standards, including XML, HTTP/NG and URNs, and describe the development of a new architectural component: metadata.
The World Wide Web was originally based on three key standards: HTML, the HyperText Markup Language which provided the data format for native resources on the Web, HTTP, the transfer protocol for the Web, and URLs, the addressing mechanism for locating Web resources. Since the early 1990s, when the Web was first developed, these underlying standards have been continually developed and a number of new Web standards are being developed. This paper summaries developments to Web standards, especially those coordinated by the World Wide Web Consortium (W3C).
Information providers on the web will be familiar with HTML. Experienced information providers will be familiar with HTML's deficiencies, such as the difficulties of defining the appearance of web pages, proprietary HTML extensions and the browser wars, difficulties in reusing information stored in HTML and the difficulties in maintaining large websites.
The HTML 4.0 recommendation [1] primarily addresses deficiencies in HTML 3.2's accessibility support (e.g. improving access to web sites by people with disabilities by providing hints to voice browsers for the visual impaired). In addition it provides better integration with style sheets (described below). Web authors expecting a range of new tags in HTML 4.0 to provide more control over the appearance of HTML documents will be disappointed, as the intention is for HTML to define the structure of a document, and to use style sheets for described how the structure is to be displayed.
Cascading style sheets help to address some of the difficulties mentioned above. CSS 2.0 [2] provides comprehensive control over the appearance of HTML documents. Use of external style sheet files can also help with the maintenance of web sites: the separation of content from appearance means that the look-and-feel of a web site can be maintained without having to edit files contained the content, and single or small, manageable numbers of corporate style sheet files can be easily edited to provide change a website's design.
The development of a Document Object Model (DOM) [3] for HTML will enable interactive web sites to be developed more easily. The release of the DOM recommendation should help to avoid the problems in supporting differing implementations for client-side scripting languages provided by mainstream browser vendors. The browser vendors will be encouraged to support the DOM due to its architectural strengths, which have been discussed thoroughly within the W3C DOM Working Group.
Although HTML 4.0, CSS 2.0 and DOM 1.0 provide the underlying standards for the development of atractive, maintainable and interactive websites, they do not address HTML's limited support for structured documents and supporting reusable documents. XML (Extensible Markup Language) [4] has been designed to enable arbitrary document structures to be defined.
Although end users of the web appreciate the web's hyperlinking mechanism, the hypertext community have criticised the web's limited hyperlinking funcationality. Providers of large web sites are also becoming aware of the difficulties in maintaining hyperlinks which are embedded in HTML documents.
The development of XML provided an opportunity for the web's hyperlinking deficiencies to be addressed. XLink [5] provides additional hyperlinking functionality, including links that lead users to multiple destinations, Bidirectional links and Links with special behaviours. In addition external link databases will ease the maintenance of hyperlinks.
XPointer [6] addresses the limitations provided in HTML for processing pointers into documents. With XPointer it will be possible to link to any portion of an XML document, even if the author has not provided an internal anchor. It will also be possible to link to portions of an XML document.
Version 1.0 of HTTP (the HyperText Transfer Protocol) [7] suffered from design flaws and implementation problems. Many of the problems have been addressed by HTTP/1.1, such as support for virtual hosts and improved support for caching. However HTTP/1.1 is insufficiently flexible or extensible to support the development of tightly-integrated web applications. HTTP/NG [8] is a radical redesign using object-oriented technologies which aims to address these concerns.
Most experienced web users will have encountered the dreaded 404 error message indicating that a resource has not been found. URLs such as the ficticious http://www.bristol-poly.ac.uk/depts/music/ are liable to change due to changes in the name of the organisation, internal reorganisation or reorganisational of the underlying web directorty structure.
URNs (Uniform Resource Names) [9] have been proposed as a solution to some of the deficiencies of URLs. Other alternatives include DOIs (Document Object Identifiers) [10] and PURLs (Persistent URLs) [11]. However widescale deploytment of these technologies does not appear likely in the near future, in part due to the organisational, as opposed to technical, requirements needed for their deployment. The pragmatic solution is to recognise that URLs don't break - people break them - and that URLs should be designed to have a long life-span.
Metadata can be regarded as the missing architectural component from the initial implementation of the web. During the mid 1990s there were web several developments, resource discovery, web site mapping and digital signatures which were all aspects of metadata.
In order to coordinate such metadata developments the W3C set up a Metadata Coordination Group [12] which developed RDF - the Resource Description Framework [13], which provides a general framework for the deployment of metadata applications. RDF is now being used for a number of applications, such as Dublin Core metadata, digital signatures, etc.
How are the new developments mentioned in this article to be deployed in a world in which a wide range of browser versions and hardware platforms are in use? We may find that the benefits provided by browsers which implement new standards are so attractive that users very quickly deploy new versions of browsers. On the other hand we may find it is too expensive to do this. Although web protocols and data formats are intended to be backwards compatible and degrade gracefully, in practice (due partly to software bugs, but also to protocol deficiencies) this is not the case. Transparent Content Negotiation [14] has been proposed as protocol solutions to the deployment of new data formats, but unfortunately is not widely deployed.
Increasingly we are finding that web server management applications and toolkits are providing support for browser-agent negotiation. Browser-agent negotiation, for example, is supported by W3C's CSS Gallery [15].
Although browser-agent negotiation is an application solution, and not part of underlying web protocols. However a recent W3C Note on CC/PP [16] has been submitted which describes an RDF application for defining browser functionality in a machine-understandable way.
In addition to these protocol developments, in may also be possible to deploy new technologies through the use of proxy intermediaries. For example, HTTP/NG could be introduced by deploying HTTP/NG caches on a national or institutional level, until HTTP/NG-aware browsers are widely available.
Other URLs relevant to this paper are listed below.