Span Relation Annotation¶

Span relations are a stand-off extension of relation annotation that allows for more complex relations, such as word alignments that include many-to-one, one-to-many or many-to-many alignments. One of its uses is in the alignment of multiple translations of (parts) of a text.

Specification¶

Annotation Category:
	Higher-order Annotation
Declaration:	`<spanrelation-annotation set="...">` (note: set is optional for this annotation type; if you declare this annotation type to be setless you can not assign classes)
Version History:
	since v0.8, renamed from complexalignment in v2.0
Element:	`<spanrelation>`
API Class:	`SpanRelation` (FoLiApy API Reference)
Required Attributes:

Optional Attributes:
	`xml:id` – The ID of the element; this has to be a unique in the entire document or collection of documents (corpus). All identifiers in FoLiA are of the XML NCName datatype, which roughly means it is a unique string that has to start with a letter (not a number or symbol), may contain numbers, but may never contain colons or spaces. FoLiA does not define any naming convention for IDs. `set` – The set of the element, ideally a URI linking to a set definition (see Set Definitions (Vocabulary)) or otherwise a uniquely identifying string. The `set` must be referred to also in the Annotation Declarations for this annotation type. `class` – The class of the annotation, i.e. the annotation tag in the vocabulary defined by `set`. `processor` – This refers to the ID of a processor in the Provenance Data. The processor in turn defines exactly who or what was the annotator of the annotation. `annotator` – This is an older alternative to the `processor` attribute, without support for full provenance. The annotator attribute simply refers to the name o ID of the system or human annotator that made the annotation. `annotatortype` – This is an older alternative to the `processor` attribute, without support for full provenance. It is used together with `annotator` and specific the type of the annotator, either `manual` for human annotators or `auto` for automated systems. `confidence` – A floating point value between zero and one; expresses the confidence the annotator places in his annotation. `datetime` – The date and time when this annotation was recorded, the format is `YYYY-MM-DDThh:mm:ss` (note the literal T in the middle to separate date from time), as per the XSD Datetime data type. `n` – A number in a sequence, corresponding to a number in the original document, for example chapter numbers, section numbers, list item numbers. This this not have to be an actual number but other sequence identifiers are also possible (think alphanumeric characters or roman numerals). `src` – Points to a file or full URL of a sound or video file. This attribute is inheritable. `begintime` – A timestamp in `HH:MM:SS.MMM` format, indicating the begin time of the speech. If a sound clip is specified (`src`); the timestamp refers to a location in the soundclip. `endtime` – A timestamp in `HH:MM:SS.MMM` format, indicating the end time of the speech. If a sound clip is specified (`src`); the timestamp refers to a location in the soundclip. `speaker` – A string identifying the speaker. This attribute is inheritable. Multiple speakers are not allowed, simply do not specify a speaker on a certain level if you are unable to link the speech to a specific (single) speaker. `tag` – Contains a space separated list of processing tags associated with the element. A processing tag carries arbitrary user-defined information that may aid in processing a document. It may carry cues on how a specific tool should treat a specific element. The tag vocabulary is specific to the tool that processes the document. Tags carry no instrinsic meaning for the data representation and should not be used except to inform/aid processors in their task. Processors are encouraged to clean up the tags they use. Ideally, published FoLiA documents at the end of a processing pipeline carry no further tags. For encoding actual data, use `class` and optionally features instead.
Accepted Data:	`<comment>` (Comment Annotation), `<desc>` (Description Annotation), `<metric>` (Metric Annotation), `<relation>` (Relation Annotation)
Valid Context:	`<spanrelations>` (Span Relation Annotation)

Explanation & Examples¶

Please ensure you are familiar with Relation Annotation first, as this is an extension for that annotation type.

Note

In versions of FoLiA prior to 2.0, this annotation type was called complex alignments

Under span relations we count more complex relations such as many-to-one, one-to-many and many-to-many relations between arbitrary elements. The element <spanrelation> behaves similarly to a span annotation element, operating in a stand-off fashion. This element groups <relation> elements together, effectively creating a many-to-many relation. The following example illustrates an example similar to the one above. All this takes place within the <spanrelations> annotation layer.

Consider the following example:

<?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" xmlns:xlink="http://www.w3.org/1999/xlink" version="2.0" xml:id="example-english">
  <metadata>
      <annotations>
          <token-annotation set="https://raw.githubusercontent.com/LanguageMachines/uctodata/master/setdefinitions/tokconfig-eng.foliaset.ttl">
             <annotator processor="p1" />
          </token-annotation>
          <text-annotation>
             <annotator processor="p1" />
          </text-annotation>
          <sentence-annotation>
             <annotator processor="p1" />
          </sentence-annotation>
          <relation-annotation set="ad-hoc-translation-set">
             <annotator processor="p1" />
          </relation-annotation>
          <spanrelation-annotation>
             <annotator processor="p1" />
          </spanrelation-annotation>
      </annotations>
      <provenance>
         <processor xml:id="p1" name="proycon" type="manual" />
      </provenance>
  </metadata>
  <text xml:id="example-english.text">
    <s xml:id="example-english.p.1.s.1">
      <t>The Dalai Lama greeted him.</t>
      <w xml:id="example-english.p.1.s.1.w.1"><t>The</t></w>
      <w xml:id="example-english.p.1.s.1.w.2"><t>Dalai</t></w>
      <w xml:id="example-english.p.1.s.1.w.3"><t>Lama</t></w>
      <w xml:id="example-english.p.1.s.1.w.4"><t>greeted</t></w>
      <w xml:id="example-english.p.1.s.1.w.5" space="no"><t>him</t></w>
      <w xml:id="example-english.p.1.s.1.w.6"><t>.</t></w>
      <spanrelations>
        <spanrelation>
          <relation class="original">
            <xref id="example-english.p.1.s.1.w.2" t="Dalai" type="w"/>
            <xref id="example-english.p.1.s.1.w.3" t="Lama" type="w"/>
          </relation>
          <relation class="french" xlink:href="doc-french.xml" xlink:type="simple">
            <xref id="example-french.p.1.s.1.w.2" t="Dalai" type="w"/>
            <xref id="example-french.p.1.s.1.w.3" t="Lama" type="w"/>
          </relation>
        </spanrelation>
      </spanrelations>
    </s>
  </text>
</FoLiA>

Here <xref> is used instead of the <wref> element we know from Span Annotation. as despite similarities relations are technically not exactly span annotation elements. You can in fact relate to anything that can carry an ID, within the same document and across multiple documents. Moreover, the notion of relations is not limited to just words, and it can be used for more than specifying translations.

The first <relation> element has no xlink reference, and therefore simply refers to the current document. The second relation element links to the foreign document. This notation is powerful as it allows you to specify a large number of relations in a concise matter. Consider the next example in which we added German and Italian, effectively specifying what can be perceived as 16 relationships over four different documents:

<s xml:id="example-english.p.1.s.1">
  <t>The Dalai Lama greeted him.</t>
  <w xml:id="example-english.p.1.s.1.w.1"><t>The</t></w>
  <w xml:id="example-english.p.1.s.1.w.2"><t>Dalai</t></w>
  <w xml:id="example-english.p.1.s.1.w.3"><t>Lama</t></w>
  <w xml:id="example-english.p.1.s.1.w.4"><t>greeted</t></w>
  <w xml:id="example-english.p.1.s.1.w.5"><t>him</t></w>
  <w xml:id="example-english.p.1.s.1.w.6"><t>.</t></w>
  <spanrelations>
    <spanrelation>
      <relation class="english-translation">
        <xref id="example-english.p.1.s.1.w.2" t="Dalai" type="w"/>
        <xref id="example-english.p.1.s.1.w.3" t="Lama" type="w"/>
      </relation>
      <relation class="french-translation"
       xlink:href="doc-french.xml"
       xlink:type="simple">
        <xref id="example-french.p.1.s.1.w.2" t="Dalai" type="w"/>
        <xref id="example-french.p.1.s.1.w.3" t="Lama" type="w"/>
      </relation>
      <relation class="german-translation"
       xlink:href="doc-german.xml"
       xlink:type="simple">
        <xref id="example-german.p.1.s.1.w.2" t="Dalai" type="w"/>
        <xref id="example-german.p.1.s.1.w.3" t="Lama" type="w"/>
      </relation>
      <relation class="italian-translation"
       xlink:href="doc-italian.xml"
       xlink:type="simple">
        <xref id="example-italian.p.1.s.1.w.2" t="Dalai" type="w"/>
        <xref id="example-italian.p.1.s.1.w.3" t="Lama" type="w"/>
      </relation>
    </spanrelation>
  </spanrelations>
</s>

Now you can even envision a FoLiA document that does not hold actual content, but acts merely as a document containing all relations between for example different translations of the document. Allowing for all relations to be contained in a single document rather than having to be made explicit in each language version.

The <spanrelation> element itself may also take a set, which is independent from the alignment set. They therefore also have two separate declarations.