SWORD APP evaluation
From DigiRepWiki
The Standards
- http://atompub.org/rfc4287.html (ATOM)
- http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-14.html (APP - AtomPub)
Evaluation of the Atom Publishing Protocol (APP - AtomPub) and Atom Syndication Format (ATOM) against SWORD parameters for repository deposit
TO EVALUATE: extensions mechanism; whether it will be constraining due to its application-specific nature and/or too in-depth for our purposes, requiring unnecessary implementation; mediated deposit possibilities, extensibility, namespaces, service description, atom tools, need to define new headers for atom, http headers; how does atom handle metadata?
SWORD Parameter |
APP / ATOM |
Possible extensions |
Notes / questions |
|
|
|
|
ExplainRequest |
GET to Service Document |
- |
?conditional GET to return a particular set of information in the service document? (e.g. related to authentication) |
--onBehalfOf TargetUser |
no |
|
is it possible to include this in a GET? where does authentication fit in here? |
ExplainResponse |
Service Document |
- |
|
-Wrapper |
<app:service> |
- |
|
--ServerLevel |
no |
<sword:level>0|1 |
|
--Version |
no |
- |
not required if using APP |
-Repository |
<workspace> |
- |
workspace = repository |
--ID (M) |
<atom:title> |
<dc:identifier> or <baseURL> (http://www.openarchives.org/OAI/2.0/provenance.xsd) |
<atom:title> is mandatory in atom (human-readable name for the workspace); <app:workspace> can be extended, <dc:identifier> or <baseURL> from the oai-pmh provenance schema could be used to identify the repository if necessary. |
--Policy |
no |
<dcterms:accrualPolicy> |
<app:workspace> could be extended with <dcterms:accrualPolicy> with text and/or URI for a policy statement. |
--VerboseSupported |
no |
<sword:verboseSupport>true|false |
|
--NoOpSupported |
no |
<sword:noOpSupport>true|false |
|
--ChecksumTypeSupported |
no |
<sword:checksumType>true|false |
Recommend MD5? |
--MediationAllowed true||false |
no |
<sword:mediation>true|false |
|
-Collections |
<app:collection> |
|
|
--ID |
<app:collection href”atomURI”> |
|
Collection URI is captured as an attribute of the <app:collection> element and is mandatory; is this enough? An additional <dc:identifier> could be created as an extension if necessary. |
--Name |
<atom:title> |
|
<atom:title> is mandatory in atom |
--Description |
no |
<dcterms:abstract> |
<app:collection> could be extended with a <dcterms:abstract> element |
--Default |
no |
<sword:defaultCollection>true|false |
Presence of one collection could indicate default? Is an extension necessary? |
--DescribeFormat |
<accept> |
<sword:format> |
Specifies a comma-separated list of media-ranges; is this enough, do we need a vocabulary of formats and an element extension to <app:collection>? Do we need to distinguish between different xml documents (didl, mods, ims etc.) |
---FormatID |
media type |
<dc:format> (with vocab) |
as mime media type only |
---FormatDescription |
No |
<sword:formatDescription> |
possible extension to allow more detailed description/identification of accepted formats, see note about; could extend this to support namespace and schema (see oai-pmh) |
--TreatmentDescription |
no |
<sword:treatment> |
|
--CollectionPolicy |
no |
<dcterms:accrualPolicy> |
<app:collection> could be extended with <dcterms:accrualPolicy> with text and/or URI for a policy statement. |
Deposit |
POST to URI of Collection |
|
|
--TargetCollection |
Collection URI |
|
|
--Format |
Content-Type in POST; <atom:content type=””> |
<dc:format> (using vocab) |
mime media type in either case; <atom:content> can also contain the content (e.g. xml); if extending with <sword:format> elements, some kind of description of what a zip file might be useful? |
--TransactionID |
<atom:id> |
|
There is some confusion between <atom> and <app> regarding the <atom:id>. Atom defines is as 'permanent universally unique identifier for an entry or feed'; whereas APP states that 'The Entry created and returned by the Collection might not match the Entry POSTed by the client. A server MAY change the values of various elements in the Entry, such as the atom:id, atom:updated and atom:author values' – this requires some clarification in the SWORD profile. |
--Verbose |
no |
<sword:verbose>true|false |
|
--NoOp |
no |
<sword:noOp>true|false |
|
--Checksum |
no |
<sword:checksum> |
Or use content-MD5 http header value? |
--ChecksumType |
no |
<sword:checksumType> |
Recommend MD5? |
--TargetOwner |
<atom:author> |
|
Possible foaf extensions for a username (for both author and contributor) atom:contributor could be used for depositor; with atom:author for the target 'owner' (will this always be the 'author'? Can we assume/profile our use of author/contributor in this way); or might extend this with dcterms:mediator? |
Receipt |
HTTP Response: 201 Created Location: Member Entry URI |
|
|
-Wrapper |
No |
|
not used; response would be a HTTP response |
--ServiceLevel 0||1 |
no |
|
is this necessary here? |
--Version |
no |
|
not necessary if using APP |
-Receipt |
Atom Entry |
|
|
--TransactionID |
<atom:id> |
|
See notes above about confusion wrt <atom:id> |
--IdentifierURI (M) |
Location: (MemberURI) in response |
|
In app, the URI of the Media Link Entry is mandatory in the response (as Location:) |
---ObjectURL |
<link rel=”edit-media” href=””> <atom:content type=”” src””> |
|
URI for the media resource; these do not have to be the same. |
---DisplayURL |
<link rel=”edit” href=””> (MemberURI) |
|
URI for the Media Link Entry |
--DepositStatus (M) |
http status codes returned in the response |
|
|
---Accepted |
201 Created |
|
202 Accepted could be used for cases where there will be a delay in processing |
---Rejected |
415 Unsupported Media Type |
|
|
---Error |
http 4xx or 5xx codes (see below) |
|
|
--ErrorCode |
4xx or 5xx codes |
|
Do we need to specify sword-specific error codes returned as xml (see oai-pmh), or are those returned in http responses sufficient? |
---ErrorContent |
404 Not Found |
|
For cases where the server cannot access the material to be deposited |
---ErrorParse |
no |
|
Is this necessary? |
---ErrorChecksumMismatch |
no |
|
Could this be included in the atom entry? would there be an atom entry if the deposit had failed? If content-MD5 http header was used, how would a mismatch be identified |
---ErrorUnknownChecksumType |
no |
|
see above |
---ErrorBadRequest |
400 Bad Request |
|
|
---ErrorTargetUserUnknown |
401 Unauthorised or 407 Proxy Authorisation Required |
|
|
---ErrorMediationNotAllowed |
403 Forbidden |
|
|
--ErrorDescription (M) |
yes |
|
|
--TreatmentDescription (M) |
no |
<sword:treatment>
|
|
--FormatHandled |
content-type (in <atom:entry> and response) |
<dc:format> (from vocab) |
|
--VerboseDescription |
No |
<dc:description> |
|
--NoOp true||false |
no |
<sword:noOp>true|false |
|
--Checksum |
no |
<sword:checksum> |
|
--ChecksumType |
no |
<sword:checksumType> |
Do we need to support multiple checksum types, or is MD5 enough? |
SWORD use of APP
- 5. Protocol Operations
- 5.1 Retrieving a Service Document USED
- 5.2 Listing Collection Members NOT USED
- 5.3 Creating a Resource USED
- 5.4 Editing a Resource NOT USED
- 5.4.1 Retrieving a Resource NOT USED
- 5.4.2 Updating a Resource NOT USED
- 5.4.3 Deleting a Resource NOT USED
- 5.5 Use of HTTP Response codes USED
- 6. Atom Publishing Protocol Documents
- 6.1 Document Types
- 6.2 Document Extensibility USED
- 7. Category Documents NOT USED
- 8. Service Documents USED
- 8.1 Workspaces USED
- 8.3 Element Definitions
- 8.3.1 The "app:service" Element USED
- 8.3.2 The "app:workspace" Element USED
- 8.3.3 The "app:collection" Element USED
- 8.3.4 The "app:accept" Element USED
- 8.3.5 The "app:categories" Element NOT USED
- 9. Creating and Editing Resources USED
- 9.1 Member URIs USED
- 9.2 Creating resources with POST USED
- 9.3 Updating Resources with PUT NOT USED
- 9.4 Deleting Resources with DELETE NOT USED
- 9.5 Caching and entity tags NOT USED?
- 9.6 Media Resources and Media Link Entries USED
- 9.7 The Slug: Header NOT USED
- 10. Listing Collections NOT USED
- 10.1 Collection partial lists
- 10.2 The "app:edited" Element
- 11. Atom Format Link Relation Extensions
- 11.1 The "edit" Link Relation USED
- 11.2 The "edit-media" Link Relation USED
- 12. The Atom Format Type Parameter USED
- 13. Atom Publishing Controls NOT USE
- 13.1 The "app:control" Element NOT USED
- 13.1.1 The "app:draft" Element NOT USED
APP and ATOM support for additional parameters
- use of <atom:generator> within <atom:source> to identify the source repository/service making the deposit; i.e. to provide provenance information, could be extended with oai-pmh provenance elements (see http://www.openarchives.org/OAI/2.0/guidelines-provenance.htm)
- <app:control>structured extension for publishing control, with <app:draft> (a request by the client to control the visibility of a Member Resource ) could be used to ask for deposits to be non-public (e.g. for embargoed material)
- Listing collections offers a facility for listing members of repository collections using <atom:feed> documents. This is out of scope for the SWORD project but might be worthy of further investigation, alongside oai-pmh sets and sitemaps.org
- Atom support for addition <link rel=””> attributes offer potential for identifying related objects
Issues, in- and out-of-scope
- Versioning, adding new 'expressions' to an existing deposit, duplication
- Identifiers, different servers assigning multiple identifiers; tracking provenance with a client ID, maintaining that ID
- Formats, identifying the different types of packaging standard used
- Mediation
- Listing Collections, mandatory in ATOM
- Authentication, must support http https
Metadata, files and packages
Three scenarios for <content>
- POST media-file (single file), with metadata embedded in <content> element as structured xml, e.g. epdcx, oai_dc
- POST media-file (single file), with metadata embedded within the object (e.g. PDF)
- POST media-file (package or zip), which contains the metadata and objects, src attribute of <content> identifies</p>
- POST xml package, which contains structured xml for both metadata and object
There is`a challenge here in knowing what we are getting
Reflections and recommendation
APP supports deposit of files (media) and is agnostic about content-types. It's easily extensible and 'foreign markup' shouldn't break processing. It also upports collections, encourages repositories to expose information about their collections in a standard way.
Start implementing based on the SWORD profile of APP, initial focus on level:0 (mandatory elements), moving to level:1 and extending the SWORD APP profile as necessary.
Proposed SWORD profile of APP / ATOM
Need to identify what elements are used and how, and what explicitly aren't. Recommendations might include metadata format (e.g. epdcx and/or simple DC) and recommended format types. We might also want to specify server/client requirements and create a (small) SWORD schema for extension elements.
Examples
To add.
Explain URLs
GET service document:
- simple user - http://www.myrepository.ac.uk/atom/servicedocument
- mediated user - http://www.myrepository.ac.uk/atom/servicedocument?onBehalfOf=lcarr
Deposit URLs
POST binary to:
- simple user - http://www.myrepository.ac.uk/atom/geography-collection
- mediated user - http://www.myrepository.ac.uk/atom/geography-collection?onBehalfOf=lcarr
- mediated user (alternative) - http://www.myrepository.ac.uk/atom/geography-collection/lcarr
- mediated user (alternative) - http://www.myrepository.ac.uk/atom/users/lcarr/geography-collection
Tools
See http://bitworking.org/projects/apptestclient/
Java client/server library for APP:
https://rome.dev.java.net/apidocs/subprojects/propono/0.4/overview-summary.html