DCMI XML Schemas UKOLN

Notes on parser behaviour



The schemas

Base schemas

  • Schema: dc.xsd
    Target XML Namespace: http://purl.org/dc/elements/1.1/
  • Schema: dcterms.xsd
    Target XML Namespace: http://purl.org/dc/terms/
  • Schema: dcmitype.xsd
    Target XML Namespace: http://purl.org/dc/dcmitype/

Container schemas

Sample application schemas

2. The parsers/validators

Results

testsimpledc.xml

Parser Result Messages
XSV Schema and instance accepted as valid  
Xerces Schema and instance accepted as valid  
MSXML4 Schema and instance accepted as valid  

testqualifieddc.xml

XSV Schema and instance accepted as valid  
Xerces Schema dcterms.xsd rejected as invalid [Error] dcterms.xsd:nnn:nn: src-ct.2: Complex Type Definition Representation Error for type 'xxxx'. When simpleContent is used, the base type must be a complexType whose content type is simple, or, only if extension is specified, a simple type.
(where 'xxxx' is the name of a complexType corresponding to one of the encoding schemes.)
MSXML4 Schema and instance accepted as valid  

The "dc:SimpleLiteral" problem

The schema dc.xsd defines a base complexType called SimpleLiteral:

  <xs:complexType name="SimpleLiteral">
   <xs:complexContent mixed="true">
    <xs:restriction base="xs:anyType">
     <xs:sequence>
      <xs:any processContents="lax" minOccurs="0" maxOccurs="0"/>
     </xs:sequence>
     <xs:attribute ref="xml:lang" use="optional"/>
    </xs:restriction>
   </xs:complexContent>
  </xs:complexType>
    

Encoding schemes are represented as complexTypes derived from the SimpleLiteral complexType. For example, the complexType corresponding to the encoding scheme for "W3CDTF" is as follows:

  <xs:complexType name="W3CDTF">
   <xs:simpleContent>
    <xs:restriction base="dc:SimpleLiteral">
        <xs:simpleType>
           <xs:union memberTypes="xs:gYear xs:gYearMonth xs:date xs:dateTime"/>
        </xs:simpleType>
        <xs:attribute ref="xml:lang" use="prohibited"/>
    </xs:restriction>
   </xs:simpleContent>
  </xs:complexType>
  

This derivation of a complexType with simpleContent by restriction of a base complexType with complexContent is valid under section 3.4.6 of XML Schema Part 1: Structures, specifically item 5.1.2 of the section "Schema Component Constraint: Derivation Valid (Restriction, Complex)", because the base complexContent is mixed and emptiable.

This was confirmed by Henry Thompson, see e.g.
http://www.w3.org/2001/05/xmlschema-rec-comments#pfiSimpleContent
http://lists.w3.org/Archives/Public/xmlschema-dev/2002Oct/0005.html
http://lists.w3.org/Archives/Public/xmlschema-dev/2002Oct/0008.html

Conclusion: Xerces appears to be behaving incorrectly in rejecting this derivation.