It is important that HTML resources comply with the HTML standard. Unfortunately in many instances this is not the case, due to limitations of HTML authoring and conversion tools, a lack of awareness of the importance of HTML compliance and the attempts made by Web browsers to render non-compliant resources. This often results in large numbers of HTML pages on Web sites not complying with HTML standards. An awareness of the situation may be obtained only when HTML validation tools are run across the Web site.
If large numbers of HTML pages are found to be non-compliant, it can be difficult to know what to do to address this problem, given the potentially significant resources implications this may involve.
One possible solution could be to run a tool such as Tidy [1] which will seek to automatically repair non-compliant pages. However, in certain circumstances an automated repair could results in significant changes to the look-and-feel of the resource. Also use of Tidy may not be appropriate if server-side technologies are used, as opposed to simple serving of HTML files.
This case study describes an alternative approach, based on use of W3C's Web Log Validator Tool.
W3C's Log Validator Tool [2] processes a Web site's server log file. The entries are validated and the most popular pages which do not comply with the HTML standard are listed.
The Web Log Validator Tool has been installed on the UKOLN Web site. The tool has been configured to process resources on the QA Focus area (i.e. resources within the http://www.ukoln.ac.uk/qa-focus/ area.
The tool has been configured to run automatically once a month and the findings held on the QA Focus Web site [3]. An example of the output is shown in Figure 1.
Figure 1: Output From The Web Log Validator Tool
When the tool is run an email is sent to the Web site editor and the findings are examined. We have a policy that we will seek to fix HTML errors which are reported by this tool.
This approach is a pragmatic one. It helps us to prioritise the resources to fix by listed the most popular pages which are non-compliant. Since only 10 non-compliant pages are listed it should be a relatively simple process to fix these resources. In addition if the errors reflect errors in the underlying template, we will be in a position to make changes to the template, in order to ensure that new pages are not created containing the same problems.
We have internal procedures for checking that HTML pages are compliant. However as these procedures are either dependent on manual use (checking pages after creation or updating) or run periodically (periodic checks across the Web site) it is useful to make use of this automated approach as an additional tool.
Ideally this tool would be deployed from the launch of the Web site, in order to ensure best practices were implemented from the start.
Brian Kelly
UKOLN
University of Bath
BATH
UK
BA2 7AY
Email: B.Kelly AT ukoln.ac.uk
QA Focus Comments
For QA Focus use.