Since it was originally born as part of the Digital Vercelli Book project (http://vbd.humnet.unipi.it/), EVT was developed to deal with the XML encoding of texts which had been prepared for that project, namely making use of the XML TEI P5 parallel transcription method (http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PH-bov). When using this method, information about the scan and possibly the coordinates of sensible areas are separated from the transcription and aligned with it thanks to linking attributes.
But, as it is possible to read in the TEI Guidelines, the scholar can choose to emphasize the importance of the physical surface and to encode words and other written traces as subcomponents of the XML elements representing the physical surface carrying them, rather than independently of them. This kind of encoding scheme is known as embedded transcription (www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHZLAB), and thanks to support from EADH (see EADH Small Grant: Call for Proposals at http://www.eadh.org/support/eadh-small-grants-call-proposals) this feature was added to the EVT software. The development took place in the period between May and July 2014.
Main changes to the original software
The main changes we implemented are mainly related to the identification and split of the text into different folios and the creation of the structure for the image-text linking tool.
Since the Vercelli Book transcription was completely encoded according to the parallel transcription method, we used different texts in order to have proper examples of embedded transcription; in particular, we used the TEI examples available in the Guidelines (http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHZLAB) and the encoded text of the Slovenian «Tri Pridige O Jeziku (three sermons on language)» (http://nl.ijs.si/e-zrc/slomsek/index-en.html).
First of all, we added an automatic detection of the encoded scheme used in the text that is being transformed (Parallel Transcription or Embedded Transcription): this identification is based on the absence/presence of the <sourceDoc> element, which is only used in ET.
If the system finds at least one <sourceDoc> element, the text will be treated as being encoded in ET: thus, each <sourceDoc> will be handled as a different document and each <surface> element, both when child of <sourceDoc> and when child of a <surfaceGrp> element, will be used to generate a single textual fragment.
We decided to consider the <surfaceGrp> element just as a mere generic division inside the code that does not produce any particular output in the interface. Moreover, even if the possible nestings of <surfaceGrp> are infinite, at the present moment the software is only able to support two levels.
The most important element after <sourceDoc> and <surface> is the <zone> element. This will be used to create the elements required for the activation of the image-text linking tool and of the hotspot tool.
A <zone> can be an empty node linked to one or more textual nodes, making use of the <line> element, or it can contain the text directly, without any further sub-elements. We considered both cases, therefore the XSLT transformation for the image-text linking tool will be activated with:
- a <zone> element that contains some text and has the spatial coordinates attributes @ulx, @uly, @lrx and @lry. In this case, each sensitive area of the image identified by the previous coordinates will highlight all the text that was nested in the <zone>, even if it is distributed on more lines.
- an empty <zone> element that has the spatial coordinates attributes and a reference to the particular <line> element it is linked to. Similarly to the previous case, each sensitive area of the image identified by the coordinates will highlight the text inside the element linked to the particular <zone>.
When the <zone> is missing the spatial coordinates attributes, the text-linking tool will not work, but the corresponding text (both if it is inside or outside the <zone> itself) will be rendered in the interface and nested in a particular HTML container (<div class=” *edition_level*–Zone “>), in such a way that the user can visually distinguish the separation between the different <zone> elements; the specific class (one for each edition level configured) allows to easily customize the visualization of the <zone> on the browser.
Instead, if the <zone> element is an empty node and the reference between it and the textual node is missing or broken, the text will properly appear on the page, but the image-text linking tool will not work for it, even if the <zone> had the spatial coordinates attributes.
As said before, in some cases the <zone> element will be used to generate an HotSpot, that is a sensible area on the image directly linked to a HTML pop-up window. We have decided to consider as a hotspot:
- every <zone> that has the spatial coordinates attributes and is nested inside another <zone>; in this case the textual box will contain the text of the innermost zone;
- every <zone> (even if a direct child of <surface>) that contains a <graphic> element with a @url attribute; in this case the textual box will contain the image referenced by the <graphic> element and the text inside the <zone> itself (if present).
All hotspots that were handled by means of the @rendition attribute in texts encoded in PT, will likewise work fine with texts encoded in ET.
The remarkable variety of possible encodings available when using the Embedded Transcription method has made the task of supporting it more complicated than we expected. As it is clear from the section above, at least in this phase of development our support is somewhat “prescriptive”, in the sense that not every possible encoding is supported. This means that text encoded according to “reasonable” principles will very likely work, while in other cases it may or may not work and thence require some modifications to the encoded text. Since this is the first version of EVT supporting the Embedded Transcription method, we expect further improvements on the basis of users’ feedback: as always, feel free to contact us with your remarks, suggestions and feature requests. We have contacted the authors of the «Tri Pridige O Jeziku (three sermons on language)» edition mentioned above, and will experiment together with them with the goal of fine tuning ET support in EVT.
EVT Project firstname.lastname@example.org
Roberto Rosselli Del Turco email@example.com
List of participants
Chiara Di Pietro (firstname.lastname@example.org)
Julia Kenny (email@example.com)
Raffaele Masotti (firstname.lastname@example.org)
Digital Vercelli Book project: http://vbd.humnet.unipi.it/.
EADH Supported activities and reports: http://www.eadh.org/support/supported-activities-and-reports [full report submitted to EADH].
Edition Visualization Technology: http://sourceforge.net/projects/evt-project/.
Digital Vercelli Book beta version using EVT: http://vbd.humnet.unipi.it/beta.
TEI P5 Parallel Transcription: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PH-bov.
TEI P5 Embedded Transcription: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/
Tri Pridige O Jeziku (three sermons on language), http://nl.ijs.si/e-zrc/slomsek/index-en.html.
Post based on the final report by Chiara Di Pietro and Julia Kenny, revised and modified by R. Rosselli Del Turco.