Rachel Heery, Paul Miller, Tony Gill, Dave Beckett
These are our personal views of the workshop and its outcomes
and are in no way meant to represent or pre-empt the official
workshop report which will be drawn up by Stu Weibel (OCLC). Discussion
of the workshop and information on the official report will be
on the meta2 mailing list.
Main items on agenda:
Extensibility
Qualifiers and element sub-structure
Element set refinements: coverage, relation, rights
Related developments: PICS
Attendees:
22 North Americans
20 Australians
16 European
7 Asian
Made up of:
25 Librarians
25 Networking
15 Content specialists
by John Kunze and Carl Lagoze
Goals for DC:
Extensibility means using extra semantics with existing element
sets. The motivation is that local communities need extra elements
but that these communities should be able to build on DC effort
not be forced to re-invent a record structure. They should be
able to add extensions of local significance.
Sets of elements are defined by:
data structure
vocabulary
vocabulary plus rules
Sets will interact and different sets will evolve separately. Issues include:
overlap
inheritance
conflict
Two models for extensibility were identified:
containment DC existing as part of a wider element set
co-existing element sets DC exists as a separate set
Three kinds of extensions were identified:
local elements (create new elements for local use)
qualifiers (use qualifier mechanism)
alternate element definitions (have alternate definitions for the same element)
Required for:
Extensions should be allowed following the W3C agreed syntax. So if for example Harvard wanted to introduce new elements they would be called HVD.element name or if there were UK specific names they would be UK.element name
Aim was to identify major issues of contention in areas of element
sub-structure. List of issues included:
It is important to distinguish the two sorts of qualifier:
scheme=name of scheme, and
type= refinement of element definition
There was general agreement that using qualifiers (in particular
type=) might all too easily result in the creation of new elements.
Although local practice might need new elements, there is a clear
tension between use of qualifiers and interoperability. It was
accepted that qualifiers had been introduced to reach consensus,
but that we need to be upfront about the detrimental effect on
interoperability. One of the main divisions among participants
were the minimalists who wanted no qualifiers and the 'extenders'
who were happy to use qualifiers. There were not many implementors
among the minimalists, and it was accepted that people would use
some qualifiers in real life.
There was a suggestion that sub-elements are permitted only for data upon which a researcher "may wish to search" -- date created, date modified etc. Telephone number etc are not normally searched upon and would be better dealt with as non-Dc elements (extensions).
There was general agreement that the DC record must be meaningful
without inclusion of qualifiers. This principle would seem in
general to permit use of scheme= qualification as this helps consistency
and the automated interpretation of values For example if one
had the following
Subject scheme=DDC content=310.1
then one would interpret numbers in this field as some sort of classification, even though you would not be able to say which scheme. Similarly in the case of
Subject scheme=LCSH content=Australian mammals
one would find the terms useful even if you were unaware they belonged to a controlled vocabulary.
(Although this was accepted the full impact of this principle
was contradicted by many suggestions, and proposed practices from
the floor.)
It was agreed that the qualifier ROLE would be abandoned. this
was introduced mistakenly and is in effect synonymous with TYPE.
It will be dropped.
It was agreed that each element could (and some would say should)
be qualified by language.
It was agreed that schemes must help in interpretation of values be based on external standards
It was agreed as a principle that types should narrow the semantics
of an element. There was some feeling that this would help to
ensure the element could still be usefully interpreted without
the qualifier. It was acknowledged that it was difficult to agree
whether a particular qualifier narrows or extends semantics, but
there was agreement that it would be useful to keep to the spirit
of this principle.
The aim was to pin down the DC syntax. The discussion took into account a proposal for changing HTML guidelines which is on the table at W3C: the Web Collections syntax.
A flavour of this proposal was outlined. It is designed to allow the relation between different web pages to be
identified. It was presented as a mechanism supported by large
vendors and was roughly as follows
<DATA PROFILE=http://www.DC.definition
<INFO NAME= LANG= SCHEME= CONTENT=
<DATA
<INFO NAME= CONTENT=
</DATA>
< INFO NAME= CONTENT=
</DATA>
This would enable cleaner expression of DC and would allow for simpler statements of types/schemes/language. The DATA wrapper links to the DC definition, and allows the dropping of DC. from each element name. The problems of ordering does not appear clear...
Also, how are external schemes LINKed...? -- the <DATA> </DATA> tags may be nested, so an element using IMT, for example, would sit inside a second <DATA> </DATA> set, with the PROFILE being the IMT RFC. The 'higher' DC profile is simply inherited.
The general feeling was that this proposal or similar was likely to be accepted by W3C, and that the DC community should lobby W3C to ensure that our requirements were taken into account. Caveat: there was an undercurrent of opinion that this proposal was being hyped and that in reality it might not be endorsed by W3C as quickly as suggested.
Other discussion on syntax assumed that encoding recommended now should be viewed as an interim measure. Some people suggested it wasn't worth creating metadata until the W3C pronouncement came to pass....... but this caused near apoplexy among Northern Europeans!
There were two proposals that Misha Wolf (with LiamQuinn, Eric Miller and Dave Beckett) presented:
a) <META NAME=DC.author CONTENT="(TYPE=email)foo@bar.co.uk">
This is current practice amongst early implementors.
b) <META NAME=DC.author.email CONTENT="foo@bar.co.uk">
Labelled the dot mechanism. People thought this syntax reflected the fact that adding a type qualifier was in effect creating a new element.
And the consensus was that b) was accepted: type should append to the element name.
a)
<META NAME=DC.subjectCONTENT="(SCHEME=XYZ)(LANGUAGE=en)blah">
b)
<META NAME=DC.subject SCHEME=XYZ LANGUAGE=en CONTENT="blah">
This proposal (b) was thought to be moving towards the Info/Data
Web Collections syntax and was considered cleaner. But it is not
compliant with current HTML.
There was strong disagreement so the recommendation is that both
a and b are options that can be used with the type (if present)
appended to the element name in the NAME attribute (as shown in
4.2).
This group took the newly modified dot notation (mentioned above) and used it to greatly extend the power of COVERAGE. The previous TYPEs of 'spatial' and 'temporal' have been dropped altogether, and replaced by;
DC.coverage.x
DC.coverage.y
DC.coverage.z
DC.coverage.t
DC.coverage.poly
DC.coverage.line
x, y, z, and t are further qualifiable by .min or .max and then a SINGLE value.
SCHEMEs are used as normal to define various datum values etc, and include OSGB (of course!), and the replacement of the (fairly senseless, really) 'LATLONG' with DMS (lat and long, expressed as degrees, mins and secs) and DD (decimal degrees).
(Draft Paper available from Tom Baker)
Submit present draft RFC and submit the syntax appendix as separate
RFC so that can evolve on its own path. (Tony Gill, Dave Beckett,
and Paul Miller will contribute to this work)
Draft further RFC's on :
Working groups on
This was attended by 280+ people from all over Australia. There were a lot of government people (it was in Canberra!) and higher education people. This was the largest number who had ever attended a meeting at the National Library, metadata is just that popular...
Presentation by Philip DesAutels affiliated to W3C at National Library Seminar
Philip suggested PICS as the future infrastructure for associating metadata with internet content. He promoted it as a simple architecturefor describing and transporting metadata. It would support various metadata schemes with distributed schema registration. This distribution could be at the level of user, server, service provider, firewall, search service etc This might move filtering from the browser to search engines, proxies or firewalls. On questioning Philip stated that where 'the registry' is in relation to the user/server could significantly impact performance.
Rachel Heery gave a summary of some of themes related to metadata emerging in the projects with which UKOLN is involved: ROADS, NewsAgent, DESIRE and BIBLINK.
Stu Weibel on DC, Carl Lagoze on re-thinking metadata, John Perkins
on CIMI, Eliot Christian on GILS, Renato Iannella on Australian
metadata projects, Rebecca Guenther on USMARC.