Technical details on the data classes
The general working of the Data class is described in the functional specification.
To add new data types to the system, it is necessary to create a new subclass of Data which implements at least the following methods:
- mime_type - return a string corresponding to the MIME type
- size - returns the size of the object in bytes
- position - returns the current position of the read cursor
- get_byte - returns the next byte from the object, or nil at EOS
- all_bytes - returns all bytes in the object as an array
- seek - adjust the position of the read cursor
- reset - reset the cursor to the initial state
For file-base storage the byte-access methods can directly wrap the Ruby IO/File class.
The following methods are provided by the base class automatically
- checksum - Note: Still to be done
- each_byte - an iterator that calls a block on each byte in the object
Database table for Data objects
The data objects will be stored in the database table data_records. The DataRecord? class uses single-table inheritance. The SourceRecord? class will declare :has_many data_records The table contains the following fields (plus the id field created by Rails):
- source_record_id - has_many relationship from the source object table. Enforced by database constraint.
- type - the type of the data. This is a string which corresponds to the subclass of Data that is responsible for handling this data object. The single-table inheritance will take care of instanciating the correct subclasses.
- location - A string that describes the location of the binary data. This may be the file name, or a database record id, etc. The interpretation of this field is up to the specific subclass of Data
Data storage
The data can be stored anywhere, as long as the data object can provide the binary data. There are some obvious ways to store the data, however:
File based storage
For normal file-based storage, data files should be stored in $RAILS_ROOT/data. This directory will be available in the Data class as `
However, every data class should store it's files in a subdirectory, named like the class itself. This will avoid collisions between different data types. For this, the method #data_directory will always provide the correct path to the data directory of the current class.
If this mechanism is used, the location field in the database only needs to contain the filename.
The above will be in a FileStore mixin, which should provide:
- The methods mentioned above
- A create_from_file method to create a new object from an existing file
- The mapping from the File/IO stream to the methods for the data object
Database storage
To avoid the creation of too many tables, a table called data_store will be created. This table contains the following fields:
- data_record_id - A db-enforced relation to the data_sources table. This is redundant to the location field in the data_sources table, but it helps to protect the integrity at database level.
- data_bytes - A byte field that contains the actual information
Note: The solution with the byte field is not always optimal, but if we'd allow the database to handle text (and character encodings), we would not be able to compute the checksum correctly.
It is highly recommended to use the database store only for small items.
Note: We could create an abstract DbData subclass which provides all the logic for database storage.
Location of the data type classes
The files for the data type classes will be stored in talia_core/lib/talia_core/data_types and the Talia loader will automatically load all the classes from that directory
