Bucharest Workshop
These are the results of the "Work Group 2 (Talia)" discussions from the COST workshop in Bucharest on the 24.-26th of May, 2007.
Participants
- Stefan Gradman
- Michele Barbera
- Michele Nucci
- Luca Guidi
- Daniel Hahn
Basic assumptions
We made some very basic assumptions, that reflect on the design of the system:
- Once published, a document will never change.
- A published document may be block from access, but never deleted from the system
- Identifiers in the Talia system are URIs/URLs
- Scholarly content will always remain available for at least read-only access
- It might be necessary to block access to certain manuscripts in the future, for legal reasons.
Summary of results
- We will look into external (interoperability) standards, to see if they should be used in Talia
- We will also refer to such standards in our work
- We should attempt to publish our results to the community
- Stefan makes further research results available to the group
- Identifiers will be URIs of the form <domain>/<uid>, where the <domain> part identifies the node. The <uid> is a locally unique string, controlled by the node. It should be machine-created.
- The system will never know about any semantics within the identifiers.
- The <domain> part refers to an actual domain. If it cannot be resolved for any reason, it may be redirected through a central service. The "official" name always remains unchanged.
- We will look for a mechanism to describe microstructures of different MIME types. (Luca Guidi)
- Authentication of persons will be done through OpenID or a similiar scheme. We will provide our own identity server for users that don't have an OpenID identity otherwise
- Authentication between hosts will be done through PGP or ActiveResource? (built-in), on a Trackback-like protocol.
- Authorization of operations will be done against user roles at model level. We will protect the transitions between workflow steps, and each model modification will cause such a transition.
- All documents will be integrity protected by a signed hash, which is calculated over combination of the document and it's identifier.
- We will have a modular standard workflow, where steps may be rearranged and disabled. The workflow will not be more customizable than that.
- We need to write a working group report
- The Talia group needs to prioritize the ontology types. Maybe this can be done at the Bergen workshop.
- We should also attempt to identify the most common document types to be used by the Talia partners.
- We should identify the types of microstructures the researchers want to refer to.
- There may be additional stuff in the Scholarsource project.
Reference to external models
Stefan suggests that for building the Talia architecture we should research and refer to external reference models. He introduced several existing standards, the presentation was distributed within the group.
General interoperability questions
- On which level does the interoperation take place
- Who are the interoperating entities? (e.g. libraries, museums, ...)
- Which objects will be exposed?
Dublin Core/DCMI Abstract
Stefan presented the Dublin Core Element Set and the abstract model. We agreed that it may be a good idea to include this element types within Talia.
DELOS
The DELOS digital library manifesto and reference architecture.
5S
5S provides a formal theory for digital libraries. The ontology should be researched and may contain ideas for Talia functionality. (On the slides, go for the "Classified View" slide for a good overview.
DRIVER/Pathways/OREI
These are models for interoperating libraries/federations of libraries. Pathways has a data model quite similiar to the Talia one. (Note the lineage in that model, it doesn't exist in Talia!)
JCR (JSR 170/JSR 283)
These specify a very complex Java API for repositories. These may be looked through for useful approaches. A complete implementation of this functionality, however, is neither feasible nor called for within Talia.
Results
We agreed we would look into the standards in more detail, in order to see how/if they can and should be applied to Talia. We also agreed that we need to refer to those standards within our work. We should also make our own work known to the community.
There is no one responsible for this task yet!
Identifiers
Also in this part, Stefan presented a number of standards in the field, e.g. the DINI and DOI standards. We discussed which kind of identifier would be applicable for Talia.
Problems/Questions mentioned during the discussion:
- Will versioning be handled by the system?
- Should we use a central service to resolve identifiers?
- How should they describe the microstructure of documents?
- Will the document integrity be protected?
- Talia's identifiers are URLs, can they be safe from change
Social Aspects
We acknowledged that this problem is largely social in nature. The success of any identifier scheme depends on the willingness of the people to actually use it. We can only provide a solution to the technical part.
Problem of Granularity/Document types
We agreed that Talia should be open for as many document formats as possible. We discussed in length how to describe microstructures within the different document types. One approach would add a "descriptive" URL part to the identifier. Another would create a new identifier that refers to a microstructure description.
Results
We agreed on using URLs as identifiers. The domain part of the URL will identify the Talia node. The rest of the URL will be a locally unique identifier string for the resource.
The local host controls the format of the identifiers. The system will only require that they are unique within the local host, and also that they should be machine-created.
In the case of a Talia node loosing it's domain name, there will be a central resolver service. If no valid document can be retrieved from a URL, the resolver service will redirect the request to the new URL. However, the original URL will still be the official identifier of the document.
This also implies that the documents will be integrity protected. See below.
Finally, the system will not know about any semantics within the identifiers. Legacy identifiers are supported, and the node may create new identifiers in any way it wishes.
Luca Guidi will do some research into the different ways on how to describe document microstructures for various MIME types.
Documents which have no describing microstructure can only be referred to at document level.
Document protection
We will protect each document by a signed hash which is calculated over the document and it's identifier. The hash values will be made public.
Workflows
We agreed to have a look at standard approaches, like the GAP workflow, for ideas. We assume that there are four broad types of workflows:
- Open
- Single-blind
- Double-blind
- Public post-publication review
Michele suggested that we also look at the workflows of existing CMS systems.
Michele reported that although HyperJournal? contains a graphical workflow editing engine, it is almost never used.
Results
We agreed that we will create a modular workflows with predefined steps. The steps may be disabled and rearranged, but it will not be necessary to use a workflow description language or a higher level engine.
Identity management
We discussed several approaches to identity management and the point of authorization in the system. Stefan argued for a higher-level authorization system, while Daniel thought authorization at Model level would be easiest to secure.
We also discussed authentication methods for persons, as well as methods for inter-node requests.
Results
User authentication will be done by OpenID/Shibboleth. The permissions will be set on the user stubs at the local node. Permissions will be valid within one node only. We will run an OpenID server for the federation, which can be used by users that do not have an OpenID already. (Michele got the source code for one: vIdentity).
Requests between two-nodes will probably use a Trackback-like protocol, which is protected using ActiveResource? authentication or PGP. Luca Guidi will research the ActiveResource? authentication
User authorization will be grouped by roles. The authorization protects the workflow changes of a document; any document modification will cause a status change. This will protect the document against modification.
In future versions of Talia, it may also be necessary to block read access to certain documents, which means implementing some form of DRM scheme.
Vocabulary
During the meeting it was noted that were some words that have a different meaning in the different fields (e.g. computing and humanities), and cause misunderstandings. We will collect those terms in the Project wiki, and try to explain them. Before project meetings, the terms will be "promoted" to the official project page.
Attachments
-
SubGroup2-20070524-26.pdf
(100.1 KB) - added by gradmann
5 years ago.
Stefan Gradmann's result presentation given during the meeting - just for the records!
