RDF integration in Talia
It turns out that most "semantic" digital libraries use some form of RDF triplet store for relations between objects: Some (e.g. Bricks and JeromeDL) directly incorporate the RDF. Fedora, on the other hand stores RDF triplets with each object, and combines those in an "additional" triplet store. In Fedora there is also some experience with the performance and behaviour of different RDF storage solutions.
From a conceptual point of view, Talia relations describe triplets anyway, giving an easy mapping.
It also appears that Talia needs to be "RDF-aware" to some extent. There are different paths we could take:
Original approach: "Talia native store"
This approach would continue to use the original model (TaliaSource and TaliaRelation), and try to map this to RDF. This could work like this
- Creating RDF triplets: Trivial, just give the relation "triplet". We could also easily create a "virtual" URL for the relation types to be referenced in RDF triplets.
- Creating an ontology description: This shouldn't be very hard either. The "description" of the types could be exported as an ontology.
- Consuming RDF (if neccessary): This would require some work, however it should be possible. However, only data that complies with the library description can be exported.
- Importing ontologies (if neccessary): This would be somewhat difficult, since a mapping from the ontology description (e.g. OWL) to the internal format would have to be found.
- Reasoning: The amount of "reasoning" needed would be limited. However, everything needs to be built from scratch
Pros
- Full control over the storage
- SQL backends are very mature
- All parts can be tweaked
- "Description language" can be tailored to our needs
- No additional complexity is added
Cons
- Others report that the approach does not scale as well as a "proper" RDF store
- No sematic features for "free"
- Standard compliance requires additional work, if want to add functionality, that may be an issue
Alternative: Use RDF triplet store in the backend
In this approach we would rather use an existing RDF store as the backend to store the relations. Instead of our own syntax, we would access the data with ActiveRDF.
- ActiveRDF provides the syntax for describing the contents
- Ontologies would be described using RDFS and not OWL, as one of the requirement is to have support for ordered lists, not available in OWL.
Pros
- Standard compliance is automatic
- No additional work to map to existing formats
- Evolvement of the standards are the job of the backend
- Interaction with the DBin solution would be easier
Cons
- Describing and accessing the data may be more complex
- [NOTE]: In really, accessing/modifing the data is not complex, especially using ActiveRDF.
- ActiveRDF is still somewhat experimental, and we may have to add functionality on our own.
- [NOTE]: I think that actual features of ActiveRDF will be ok for what we will have to support/implement in Talia.
- ActiveRDF stores are known to be a bit slow
- [NOTE]: It depends on the usage (for this, see the ActiveRDF FAQ).
- RDF stores usually runs in a Tomcat/Java enviroment, which adds more complexity to the system and the setup.
- [NOTE]: It depends form which RDF triple-store it is uses.
- Direct APIs are currently in Java, and using them with Rails currently requires running the whole system in a JRuby environment.
- The system will only be as good as the backend store
External Ontologies
We should be able to work with external ontologies, even if not all functions of the description language are supported. First test: FOAF ontology?
Other triple stores
During the OS workshop, Sindre (WAB) talked me about Allegrograph, a (commercial) RDF triple store. It is possible to download a free version, probably limited, but perhaps it is worth a try.
