SWORD3 discussion paper
From DigiRepWiki
This page summarises suggestions for future work on SWORD. Much of this information has been gathered from SWORD developers at the SWORD 2 kickoff meeting and ongoing discussions - attribution is given wherever possible.
Contents |
Notes from Deposit Show and Tell Meeting 12th October 2009
Thanks to Julie Allinson for these notes
Technical Developments
SWORD extensions - Richard Jones
onBehalfOf - would be useful for update/delete. half page spec
updating implementations to catch-up with update/delete
pure mime solution to multi part content deposit using related RJ does 2 parts; atom as manifest, but this breaks down if depositing atom document
Package types registry
RJ - translating between package types is more important
Make the current test implementations truly interoperable
treatment is 'prose'
what projects - packaging - to provide an answer to SWORD, rather than sword provide the answer - can pronom be that registry; or set up a IETF registry - needs to be user led; people want to be told what to do
- sword extensions
SWORDCamp
A week of development, update original demonstrators, others update their clients/code. Possibly get sponsorship.
Agree package standard in advance. Use the week to implement. Could be timed to coincide with Open Repositories 2010. Make sure the right people get to the event.
Jim's point about URIs, this isn't a problem if posting binary files
RJ - two part post XML doc describing - this is an atom feed Second part is binary content
Content of 'onBehalfOf' - useful but not as useful as it could be. RJ - sometimes needs some additional info about the user, eg. LDAP. Oauth - needs considering.
deposit to collections - user shouldn't know; collections introduce complexity dspace - can configure to expose 'items' as sword endpoints so can deposit to an item
IETF - a lot of hassle; is it worth it? probably not.
From UKOLN SWORD3 proposal
Advocacy, uptake and community support
- Write/commission a reflective piece on why SWORD has been a success, get it peer reviewed by those closely involved, and ensure lessons feed into future sword and other jisc work
- Devise a model for supporting SWORD including developing a website knowledge base and enhanced documentation including a technical primer for SWORD implementers.
- Increase SWORD uptake by marketing and promoting SWORD both nationally and internationally at conferences and workshops.
- Develop additional SWORD usage and implementation case studies e.g. based on Microsoft uptake
- Hold SWORD promotional event with show-and-tells, demonstrations, and technical workshops
- Feed into and off of the developer community work
Ongoing development work
- Support the maintenance and development of the SWORD application profile
- Update SWORD demonstrator repositories and clients based on profile developments.
- Explore lightweight extensions to the SWORD outputs
- Continue to develop synergies with other projects and development activity in the area
- Tie in with repository handshake strand of international repositories workshop (internal sponsor: Neil Jacobs)
Prototyping SWORD package types registry and SWORD repository registry
- Develop a prototype and operational methodology for a SWORD package types registry.
- Develop a ‘SWORD enabled’ repositories registry prototype and populate for use in promoting SWORD uptake.
- Explore adding ‘SWORD enabled’ information to existing repository registries such as OpenDOAR.
SWORD standardisation
- Investigate the process for standardising the SWORD profile with a number of standards bodies including NISO, CEN, and others.
- Assess and report on the resource implications. Evaluate cost/benefit and make recommendations.
- Consider alternatives to formal standardisation.
From 'SWORD Futures' Workshop at Open Repositories 2009
Would like to see docx as a package format within the SWORD types list as docx is a format which can act as a 'zip' format - Pablo Fernicola.
Nature would like to deposit submitted papers as an added author service, they'd like a switching junction to do this. Repository Junction are already talking to them, and, additionally have their EM-Loader project.
There was more discussion about SWORD potential for supporting simple/complex deposit tools, such as desktop or folder sync tools, scholarly workbench apps etc. Reporting/buffer tools could be used to add a checking step.
X-HEADERS
IETF - clear instructions + they shouldn't be used; but there are implementation implications of a change in the profile now (plus, it wouldn’t be backward compatible). One way to do this would be to run with equivalent headers for a few years and then phase out the X- ones.
Documentation
- SWORD Primer
- basic technical discovery document
- registry of implementations (inc. publishers)
Development ideas
- SWORD in Zotero, contact Zotero team?
- SWORD in other authoring applications, apple, open office?
Extending SWORD
David [Carson?] pointed to lack of funder information in the deposit process - it would be possible to include a grant namespace within the ATOM document - arXiv have customised with their own extensions, ATOMPUB supports this. PubMed Central (UK) might be interested in discussions around this. Standardising through SWORD could be for a future version of SWORD. multipart spec (rfc2387) could be used on top of atompub, with an atom document and further files.
Package Formats
The problem: mime doesn't help beyond file format
- how do we keep a list of package formats?
- what is a good type? is METS sufficient? is a generic mets type dangerous? do we need sub-types for METS profiles?)
The IETF standard way is to have a curated list (current SWORD TYPES). New submissions to an email address + list, one week to discuss, then default accept.
Simple answer - maintain current SWORD TYPES at UKOLN; Jim to write the submission mechanism into current document. Put out a call for comment with a total amnesty on types (all will be added) for two months. Then tidy up and publish a new version of the document, and start the process proper.
but ...
Rob Sanderson - registries for formats/schemas are relevant to a number of communities: SWORD, ERH, unAPI, SRU, OpenURL, Jangle.
OpenURL have a registry, run by OCLC, but there's a heavyweight submission process.
Seems sensible to engage different communities and see if we can come up with a joint solution, including whether OpenURL will relax their process.
SWORD Standardisation - IETF?
- extra level of proof-reading + verification
- makes sure it fits with other standards
- possible problems in deviating from ATOMPUB
- means SWORD is taken into account if other standards change
- would cause a lot of work
- there's also a perception/marketing aspect
might be worth waiting until there's a need.
?local standards process - unAPI is a possible.
Ideas/suggestions carried over from SWORD phase 1 and 2
SWORD Standardisation
Establish SWORD as NISO Deposit Protocol / IETF standard (Jim Downing) re. email from Dorothea Salo:
On 05/03/2008, Dorothea Salo <dsalo@library.wisc.edu> wrote: <http://www.niso.org/news/newsline/NISONewsline-Mar2008.htm#Story2> "The group proposed that a NISO working group develop a common deposit mechanism "tool" that would allow institutional repositories to capture objects as close to their creation point as possible. The capture of these objects should be part of a larger context that will allow for their exposure across a variety of domains, such as journals, subject matter repositories, and course management systems."
we could investigate what it would take to put SWORD forward as an IETF standard. (Jim Downing)
Advocacy/Uptake
Work with Nature Publishing Group to use SWORD for deposit into PMC and UKPMC (and others). We started working on getting this together as part of SWORD2 but didn't manage to get everyone together within the time frame of the project. (Grace Baynes from NPG).
Work on getting more exposure and uptake in AU and US/CA (David Flanders, all at SWORD 2 meet).
"How do we convince the world (esp. N America) that it's possible to create a useful standard without assembling an international committee of experts?
I think this comes down to usage, and I was persuaded by Les's three "most wanted" targets for SWORD: Word, arXiv and PMC. MS have already implemented SWORD in the author add on for Word, and are making very encouraging noises about working on any developments to the spec. Simeon is already involved at arXiv, and if he could push his implementation into production that would be a major coup. It would be good to find a way into PMC, or concentrate on UKPMC for now. The recent announcement by Nature might also be worth pursuing.
As a slightly separate thought, it might be good marketing to formalize the work of Microsoft, arXiv, PMC, Nature etc by asking them to be some kind of project partners whereby they commit to implement the SWORD2 protocol modifications in their software and produce demos." (Jim Downing)
Technical Developments
Workflow
Improved integration with more complex workflows
I imagine that there are many repositories that have workflows at least as complex as arXiv's (see #Workflow). Some way for the submitting client to indicate a callback mechanism better than email may well be broadly appropriate. Specification of such a mechanism would first require an overview of the methods that would work with a range of repository systems. (from arXiv case study)
While the scope of SWORD is understandably limited to defining a deposit API, the reality is that developers adopting it will want and need to use it within their business workflows. This would include deposits of temporary working documents and behind-the-scenes automated document archiving flows. It would seem useful to conduct one or two small-scale workflow projects where SWORD is used throughout various phases of the workflow, or for more advanced deposits (for example, are structured rather than textual responses to a deposit likely to be a future requirement). Alternatively, external projects could be identified and then asked to report back their experience in the form of an implementation report to inform later phases of SWORD. While external projects may not complete within the SWORD project, the identification of projects and commitment by SWORD to collate and summarise the outcomes in the form of recommendations for future work may be a satisfactory outcome. (Scott Yeadon)
Enabling workflow ... for example, it might be able to expose a "status" interface for every item in an archive (Richard Jones)
The Deposit
Taking SWORD "beyond packages"
The development of the OAI-ORE (Open Archives Initiative - Object Re-use and Exchange) complex object description standard raises some interesting questions for deposit. For example, ORE could be used to allow incremental construction of a complex object on the server side of a deposit protocol. This scenario has a number of benefits and potential use cases; as trivial examples: - Large complex objects could be built in multiple sessions
Per-file validation could feedback immediately, rather than waiting until the package is complete
Most of the SWORD constraints and extensions would be useful in this scenario. It would be interesting to see it investigated further. (SPECTRa case study)
Testing and integration with OAI-ORE more generally
Depositing non-binary objects (Paul Hart) ... such as URIs to externally referenced content. This might be enabled by allowing the deposit of ATOM documents.
How can we deal with package descriptions better? At the moment, the free text description, which is effectively a mime-type is already causing problems, now that I'm also trying to ingest other package formats (Richard Jones)
Fine-grained deposit ... like being able to add a single file to an object, or even manipulating metadata (although we don't want to just be DAV) (Richard Jones)
Accept More capacity to describe what sorts of things the deposit targets can accept. So, a DSpace specific example would be that some deposit targets can only accept collections of objects rather than individual objects, whereas some other targets will /not/ take collections. I'm not sure how you'd actually go about expressing this. (Richard Jones)
Extending to full APP implementation
- Update (Claire Knowles/Robin Taylor, Antony Corfield)
- Delete (Claire Knowles/Robin Taylor, Antony Corfield)
- Categories (Antony Corfield)
- Accepting ATOM documents
I'm not sure the Retrieve is the business of a deposit tool, and I'm not 100% that it should be able to do Delete either. So that's a kind of "things I don't think SWORD should worry about". Update, on the other hand, could be useful. (Richard Jones)
I reckon a good next step would be to work on user support; for example with a set of experiments around the edges of SWORD applicability (e.g. SWORD + ORE, SWORD with update etc etc). (Jim Downing)
The Service Document
I'd like to address some of the scalability issues with service documents by introducing hierarchies of service documents (i.e. so that you can request of a deposit target any other deposit targets underneath it). (Richard Jones)
The ATOM response
The process of ingesting an item into an archive can produce new complex objects /inside/ the archive. I'd like a way for the SWORD response to be able to describe the object that has just been created. (Richard Jones)
Other
Creating collections.
Investigating OpenAuth and OpenID as a replacement for mediated deposit.
Multiple deposit to PMC and Arxiv. (Les Carr at SWORD2 meet).
Deposit from institution's client (Les Carr at SWORD2 meet)
Other suggestions
- Ongoing maintenance of SWORD libraries
- The example classes provided with the SWORD Java common library are of more use in production environments than the authors anticipated. If possible, effort should be expended in making the tools more robust and putting a sustainability mechanism in place.
- support model for installations and demos
- arXiv integration