This page is outdated
This page is outdated. So are some other pages about feeding and searching data in eXist.
See new pages
http://trac.talia.discovery-project.eu/wiki/ExistFeedingSpecs
http://trac.talia.discovery-project.eu/wiki/ExistSearchRequest
http://trac.talia.discovery-project.eu/wiki/ExistNormalSearchResult
http://trac.talia.discovery-project.eu/wiki/ExistMacrocontributionSearchResult
http://trac.talia.discovery-project.eu/wiki/ExistMediaSearchResult
XML Format for documents in eXist
(see also #585 and ExistSearchDescription)
Original Format
The main structure is
<version> <metadata> ... <critical_edition> ... </critical_edition> </metadata> <data> ... html ... </data> </version>
Here's the full version:
#!
<?xml version='1.0' encoding='utf-8'?>
<document>
<!--
many contributions in hyper
have several displayed versions,
e.g, linear layer 1 vs diplomatic layer 1.
the exist db contains one xml document
for each displayed version.
the "version" element represents the version.
there are historical reasons for the extra
"document" element around the "version" element
-->
<version>
<metadata>
<!--
it's useful to have a human-recognizable id
for the documents stored in an eXist db.
we use as an id a string
made from the siglum,
the version type and the layer number
-->
<id>...</id>
<!-- contribution type, e.g, "essays" -->
<type>...</type>
<siglum>...</siglum>
<authors>
<author>
<lastname>...</lastname>
<firstname>...</firstname>
</author>
...
</authors>
<title>...</title>
<!--
a "standard" title here if there is
no title in the postgresql db.
not used for search
-->
<standard_title>...</standard_title>
<!-- iso two letter language code -->
<language>...</language>
<!-- iso data format, e.g, "2007-11-27" -->
<date>...</date>
<!--
critical edition data -
one element per critical edition
the contribution belongs to
-->
<critical_edition>
<!-- siglum of the critical edition -->
<siglum>...</siglum>
<!-- name of the critical edition -->
<name>...</name>
<!-- siglum of the work -->
<work_siglum>...</work_siglum>
<!-- name of the work -->
<work_name>...</work_name>
<!-- name of the related material -->
<related_material_name>...</related_material_name>
<!--
the related material's position within the work
(not within the critical edition)
-->
<position_within_work>...</position_within_work>
<!-- siglum of the related material -->
<related_material_siglum>...</related_material_siglum>
<!--
a hierarchical position thing,
made up of work siglum, underline,
and position within work (5 digits, zero filled),
e.g, "WS_00013"
-->
<position>...</position>
<!--
an element to allow "All" search.
similar to the "position" element.
made up of "all", underline,
and position within work (5 digits, zero filled),
e.g, "all_00013"
-->
<all_position>...</all_position>
</critical_edition>
</metadata>
<!--
the "data" element contains the html
of the contribution version.
the html has been "cleaned" and is well-formed xhtml.
if html is not available, pure text is stored.
(or nothing when there is no text)
-->
<data>...</data>
</version>
</document>
