A Strategy For Fixing Non-Compliant HTML Pages On The QA Focus Web Site

Problem Being Addressed

It is important that HTML resources comply with the HTML standard. Unfortunately in many instances this is not the case, due to limitations of HTML authoring and conversion tools, a lack of awareness of the importance of HTML compliance and the attempts made by Web browsers to render non-compliant resources. This often results in large numbers of HTML pages on Web sites not complying with HTML standards. An awareness of the situation may be obtained only when HTML validation tools are run across the Web site.

If large numbers of HTML pages are found to be non-compliant, it can be difficult to know what to do to address this problem, given the potentially significant resources implications this may involve.

One possible solution could be to run a tool such as Tidy [1] which will seek to automatically repair non-compliant pages. However, in certain circumstances an automated repair could results in significant changes to the look-and-feel of the resource. Also use of Tidy may not be appropriate if server-side technologies are used, as opposed to simple serving of HTML files.

This case study describes an alternative approach, based on use of W3C's Web Log Validator Tool.

Solution Used

W3C's Log Validator Tool

W3C's Log Validator Tool [2] processes a Web site's server log file. The entries are validated and the most popular pages which do not comply with the HTML standard are listed.

Use On The QA Focus Web Site

The Web Log Validator Tool has been installed on the UKOLN Web site. The tool has been configured to process resources on the QA Focus area (i.e. resources within the http://www.ukoln.ac.uk/qa-focus/ area.

The tool has been configured to run automatically once a month and the findings held on the QA Focus Web site [3]. An example of the output is shown in Figure 1.

Figure 1: Output From The Web Log Validator Tool

When the tool is run an email is sent to the Web site editor and the findings are examined. We have a policy that we will seek to fix HTML errors which are reported by this tool.

This approach is a pragmatic one. It helps us to prioritise the resources to fix by listed the most popular pages which are non-compliant. Since only 10 non-compliant pages are listed it should be a relatively simple process to fix these resources. In addition if the errors reflect errors in the underlying template, we will be in a position to make changes to the template, in order to ensure that new pages are not created containing the same problems.

Lessons Learnt

We have internal procedures for checking that HTML pages are compliant. However as these procedures are either dependent on manual use (checking pages after creation or updating) or run periodically (periodic checks across the Web site) it is useful to make use of this automated approach as an additional tool.

Ideally this tool would be deployed from the launch of the Web site, in order to ensure best practices were implemented from the start.

References

HTML Tidy Project Page,
<http://tidy.sourceforge.net/>
Log Validator Tool,
<http://www.w3.org/QA/Tools/LogValidator/>
QA Focus Statistics: Non-Compliant HTML Pages, QA Focus
<http://www.ukoln.ac.uk/qa-focus/statistics/web-log-validator/>

Contact Details

Brian Kelly
UKOLN
University of Bath
BATH
UK
BA2 7AY

Email: B.Kelly AT ukoln.ac.uk

QA Focus Comments

For QA Focus use.