If you want to know how to import the data for the Discovery Sites: DiscoverySiteImport

Talia Data Import

Talia imports data from XML files. There are several ways to get you data into the system: Either you use Talia's simple native XML format, or an existing reader for one of the other supported formats. It's also possible to write your own XML reader to deal with other XML import formats if you need to.

Basic Import Tutorial

You can easily import data in the Talia format. If you want to try it out, you will have to first install and configure Talia. Then you can clone the demo data from github:

git clone git://github.com/net7/talia_demo_data.git

Or, if you don't have git you can check the downloads page and download a zip or tarball of the demo data. Once you have, you can import the data via the command line:

rake talia_core:xml_import xml=talia_demo_data/lucca/demo.xml

You should see a progress bar, and once the import is finished the data should be available on your installation's home page.

Advanced importing

All imports from the command line are done using the 'talia_core:xml_import' task. This task accepts a number of options which can be found in the rdoc documentation.

Using importers for other formats

If you have XML data in other formats, you will need an 'importer' class for that format. Importers are Ruby classes that describe how the XML data structure should be converted to Talia data. To use an existing importer class, you can pass the importer on the command line. Talia will load that class and use it to import the data from the file that you've given:

rake talia_core:xml_import xml=talia_demo_data/lucca/my_custom_data.xml importer=MyImporterClass

Import sources

The import will both accept file names and web URLs for the data.

Handling of data files

The import data may contain references to data files. These may either be file names, URLs or paths. If paths are found, they will be either interpreted as web paths on the server that the import xml came from, or as file names if the import file came from the file system.

Talia will try to automatically detect the MIME type of a data file, using either the file extension or the MIME type supplied by the web server. Depending on the MIME type, Talia may use different kinds of data records or even a custom import routine.

For example, if Talia was configured to use the IIP server, image files will automatically converted to pyramid files for IIP. If not, they will just be imported as plain 'ImageData?' records. If you need to configure the mapping between the MIME types and the import class/actions, this can be done in an Rails initializer using the MimeMapping class.

RDF Importers

Talia now provides a mechanism to import RDF data using RDF.rb. At the moment you can use the TaliaCore::ActiveSourceParts::Rdf::NtriplesReader and the TaliaCore::ActiveSourceParts::Rdf::RdfxmlReader?, to import the ntriples or rdf/xml format, respectively.

You need to at least install the RDF.rb gem. On its own, it will only give you ntriples support, but it is the base for everything else:

gem install rdf

Note/Dependency: The rdf/xml reader uses the the raptor library to parse the format. This will work fine with JRuby, but you need to have the libraptor library and the rdf-raptor gem installed.

Libraptor from source

If you are on MacOS, or if your Linux/Unix distribution doesn't come with an installable libraptor (or redland, or raptor) packet, you can download the source and then do the usual

./configure
make
make install

MacOS Notes

  • The libraptor version from MacPorts will not work out of the box, until you explicitly tell the system to load libraries from /opt/local/libs. Better to install the lib to the default location using the procedure above.
  • Jruby must be able to open the library. The fallback solution of using the command line tool will not work with the current rdf-raptor gem; this may be fixed in the future.

Libraptor from your distribution

Many linux distributions come with an installable libraptor packet, just use that one. The packet might also be called "raptor", or be included in a "redland" package.

Raptor gem

Just do

gem install rdf-raptor

after you have install the libraptor library.