Relation Annotation¶

FoLiA provides a facility to relate arbitrary parts of your document with other parts of your document, or even with parts of other FoLiA documents or external resources, even in other formats. It thus allows linking resources together. Within this context, the xref element is used to refer to the linked FoLiA elements.

Specification¶

Annotation Category:
	Higher-order Annotation
Declaration:	`<relation-annotation set="...">` (note: set is optional for this annotation type; if you declare this annotation type to be setless you can not assign classes)
Version History:
	Revised since v0.8, renamed from alignment in v2.0
Element:	`<relation>`
API Class:	`Relation` (FoLiApy API Reference)
Required Attributes:

Optional Attributes:
	`xml:id` – The ID of the element; this has to be a unique in the entire document or collection of documents (corpus). All identifiers in FoLiA are of the XML NCName datatype, which roughly means it is a unique string that has to start with a letter (not a number or symbol), may contain numbers, but may never contain colons or spaces. FoLiA does not define any naming convention for IDs. `set` – The set of the element, ideally a URI linking to a set definition (see Set Definitions (Vocabulary)) or otherwise a uniquely identifying string. The `set` must be referred to also in the Annotation Declarations for this annotation type. `class` – The class of the annotation, i.e. the annotation tag in the vocabulary defined by `set`. `processor` – This refers to the ID of a processor in the Provenance Data. The processor in turn defines exactly who or what was the annotator of the annotation. `annotator` – This is an older alternative to the `processor` attribute, without support for full provenance. The annotator attribute simply refers to the name o ID of the system or human annotator that made the annotation. `annotatortype` – This is an older alternative to the `processor` attribute, without support for full provenance. It is used together with `annotator` and specific the type of the annotator, either `manual` for human annotators or `auto` for automated systems. `confidence` – A floating point value between zero and one; expresses the confidence the annotator places in his annotation. `datetime` – The date and time when this annotation was recorded, the format is `YYYY-MM-DDThh:mm:ss` (note the literal T in the middle to separate date from time), as per the XSD Datetime data type. `n` – A number in a sequence, corresponding to a number in the original document, for example chapter numbers, section numbers, list item numbers. This this not have to be an actual number but other sequence identifiers are also possible (think alphanumeric characters or roman numerals). `src` – Points to a file or full URL of a sound or video file. This attribute is inheritable. `begintime` – A timestamp in `HH:MM:SS.MMM` format, indicating the begin time of the speech. If a sound clip is specified (`src`); the timestamp refers to a location in the soundclip. `endtime` – A timestamp in `HH:MM:SS.MMM` format, indicating the end time of the speech. If a sound clip is specified (`src`); the timestamp refers to a location in the soundclip. `speaker` – A string identifying the speaker. This attribute is inheritable. Multiple speakers are not allowed, simply do not specify a speaker on a certain level if you are unable to link the speech to a specific (single) speaker. `tag` – Contains a space separated list of processing tags associated with the element. A processing tag carries arbitrary user-defined information that may aid in processing a document. It may carry cues on how a specific tool should treat a specific element. The tag vocabulary is specific to the tool that processes the document. Tags carry no instrinsic meaning for the data representation and should not be used except to inform/aid processors in their task. Processors are encouraged to clean up the tags they use. Ideally, published FoLiA documents at the end of a processing pipeline carry no further tags. For encoding actual data, use `class` and optionally features instead. `xlink:href` – Turns this element into a hyperlink to the specified URL `xlink:type` – The type of link (you’ll want to use `simple` in almost all cases).
Accepted Data:	`<comment>` (Comment Annotation), `<desc>` (Description Annotation), `<metric>` (Metric Annotation)
Valid Context:	`<chunk>` (Chunking), `<coreferencechain>` (Coreference Annotation), `<coreferencelink>` (Coreference Annotation), `<def>` (Definition Annotation), `<dependency>` (Dependency Annotation), `<div>` (Division Annotation), `<entity>` (Entity Annotation), `<entry>` (Entry Annotation), `<event>` (Event Annotation), `<ex>` (Example Annotation), `<figure>` (Figure Annotation), `<head>` (Head Annotation), `<hiddenw>` (Hidden Token Annotation), `<br>` (Linebreak), `<list>` (List Annotation), `<modality>` (Modality Annotation), `<morpheme>` (Morphological Annotation), `<note>` (Note Annotation), `<observation>` (Observation Annotation), `<p>` (Paragraph Annotation), `<part>` (Part Annotation), `<phoneme>` (Phonological Annotation), `<predicate>` (Predicate Annotation), `<quote>` (Quote Annotation), `<ref>` (Reference Annotation), `<semrole>` (Semantic Role Annotation), `<s>` (Sentence Annotation), `<sentiment>` (Sentiment Annotation), `<spanrelation>` (Span Relation Annotation), `<statement>` (Statement Annotation), `<str>` (String Annotation), `<su>` (Syntactic Annotation), `<table>` (Table Annotation), `<term>` (Term Annotation), `<timesegment>` (Time Segmentation), `<utt>` (Utterance Annotation), `<whitespace>` (Vertical Whitespace), `<w>` (Token Annotation)

Explanation¶

Note

In versions of FoLiA prior to 2.0, this annotation type was called alignments

FoLiA provides a facility to link parts of your document with other parts of your document, or even with parts of other FoLiA documents or external resources. These are called relations and are implemented using the <relation> element. Within this context, the <xref> element is used to cross-link to the related FoLiA elements.

Consider the two following aligned sentences from excerpts of two separate FoLiA documents in different languages:

<s xml:id="example-english.p.1.s.1">
  <t>The Dalai Lama greeted him.</t>
  <relation class="french-translation" xlink:href="doc-french.xml"
    xlink:type="simple">
     <xref id="doc-french.p.1.s.1" t="Le Dalai Lama le saluait."
     type="s" />
  </relation>
</s>

<s xml:id="example-french.p.1.s.1">
  <t>Le Dalai Lama le saluait.</t>
  <relation class="english-translation" xlink:href="doc-english.xml"
    xlink:type="simple">
      <xref id="doc-english.p.1.s.1" t="The Dalai Lama greeted him."
       type="s" />
  </relation>
  <relation class="dutch-translation" xlink:href="doc-dutch.xml"
     xlink:type="simple">
      <xref id="doc-dutch.p.1.s.1" t="De Dalai Lama begroette hem."
       type="s" />
  </relation>
</s>

It is the job of the <relation> element to point to the relevant resource, whereas the <xref> element points to a specific point inside the referenced resource. The xlink:href attribute is used to link to the target document, if any. If the relation is within the same document then it should simply be omitted. The type attribute on <xref> specifies the type of element the relation points too, i.e. its value is equal to the tagname it points to. The t attribute to the <xref> element is merely optional and this overhead is added simply to facilitate the job of limited FoLiA parsers and provides a quick reference to the target text for both parsers and human users.

Although the above example has a single relation reference (<xref>), it is not forbidden to specify multiple references within the <relation> block, effectively referring to a span of multiple elements at the target.

By default, relations are between FoLiA documents. It is, however, also possible to point to resources in different formats. This has to be made explicit using the format attribute on the <relation> element. The value of the format attribute is a MIME type and defaults to text/folia+xml (naming follows RFC 3032). In the following example align a section (<div>) with the original HTML document from which the FoLiA document is arrived, and where the section is expressed with an HTML anchor (<a>) tag.

<div class="section">
 <t>lorum ipsum etc.</t>
 <relation class="original" xlink:href="http://somewhere/original.html"
    xlink:type="simple" format="text/html">
    <xref id="section2" type="a" />
 </relation>
</div>

Translations¶

relation Annotation and Span Relation Annotation are an excellent tool for specifying translations. For situations in which relations seem overkill, a simple multi-document mechanism is available. This mechanism is based purely on convention: It assumes that structural elements that are translations simply share the same ID. This approach is quite feasible when used on higher-level structural elements, such as divisions, paragraphs, events or entries.

Example¶

The following example shows Entity Annotation with relations to Wikipedia.

<?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" xmlns:xlink="http://www.w3.org/1999/xlink" version="2.0" xml:id="example">
  <metadata>
      <annotations>
          <token-annotation set="https://raw.githubusercontent.com/LanguageMachines/uctodata/master/setdefinitions/tokconfig-eng.foliaset.ttl" format="text/turtle">
			 <annotator processor="p1" />
		  </token-annotation>
          <text-annotation>
			 <annotator processor="p1" />
          </text-annotation>
          <sentence-annotation>
			 <annotator processor="p1" />
          </sentence-annotation>
          <paragraph-annotation>
			 <annotator processor="p1" />
          </paragraph-annotation>
          <entity-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/namedentities.foliaset.ttl" format="text/turtle">
			 <annotator processor="p1" />
		  </entity-annotation>
          <relation-annotation set="adhoc">
			 <annotator processor="p1" />
          </relation-annotation>
      </annotations>
      <provenance>
         <processor xml:id="p1" name="proycon" type="manual" />
      </provenance>
  </metadata>
  <text xml:id="example.text">
    <p xml:id="example.p.1">
      <s xml:id="example.p.1.s.1">
         <t>The Dalai Lama currently lives in Dharamsala in India.</t>
         <w xml:id="example.p.1.s.1.w.1" class="WORD">
            <t>The</t>
         </w>
         <w xml:id="example.p.1.s.1.w.2" class="WORD">
            <t>Dalai</t>
         </w>
         <w xml:id="example.p.1.s.1.w.3" class="WORD">
            <t>Lama</t>
         </w>
         <w xml:id="example.p.1.s.1.w.4" class="WORD">
            <t>currently</t>
         </w>
         <w xml:id="example.p.1.s.1.w.5" class="WORD">
            <t>lives</t>
         </w>
         <w xml:id="example.p.1.s.1.w.6" class="WORD">
            <t>in</t>
         </w>
         <w xml:id="example.p.1.s.1.w.7" class="WORD">
            <t>Dharamsala</t>
         </w>
         <w xml:id="example.p.1.s.1.w.8" class="WORD">
            <t>in</t>
         </w>
         <w xml:id="example.p.1.s.1.w.9" class="WORD" space="no">
            <t>India</t>
         </w>
         <w xml:id="example.p.1.s.1.w.10" class="PUNCTUATION">
            <t>.</t>
         </w>
         <entities>
             <entity xml:id="example.p.1.s.1.entity.1" class="per">
                 <relation class="wikipedia" xlink:href="https://en.wikipedia.org/wiki/Dalai_Lama" xlink:type="simple" format="text/html" />
                 <wref id="example.p.1.s.1.w.2" t="Dalai" />
                 <wref id="example.p.1.s.1.w.3" t="Lama" />
             </entity>
             <entity xml:id="example.p.1.s.1.entity.2" class="loc.city">
                 <relation class="wikipedia" xlink:href="https://en.wikipedia.org/wiki/Dharamsala" xlink:type="simple" format="text/html" />
                 <wref id="example.p.1.s.1.w.7" t="Dharamsala" />
             </entity>
             <entity xml:id="example.p.1.s.1.entity.3" class="loc.country">
                 <relation class="wikipedia" xlink:href="https://en.wikipedia.org/wiki/India" xlink:type="simple" format="text/html" />
                 <wref id="example.p.1.s.1.w.9" t="India" />
             </entity>
         </entities>
      </s>
    </p>
  </text>
</FoLiA>

The following example shows relations within strings in a document (See also String Annotation):

<?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" version="2.0" xml:id="example">
  <metadata>
      <annotations>
          <text-annotation>
             <annotator processor="p1" />
          </text-annotation>
          <paragraph-annotation>
             <annotator processor="p1" />
          </paragraph-annotation>
          <string-annotation>
             <annotator processor="p1" />
          </string-annotation>
          <relation-annotation>
             <annotator processor="p1" />
          </relation-annotation>
      </annotations>
      <provenance>
         <processor xml:id="p1" name="proycon" type="manual" />
      </provenance>
  </metadata>
  <text xml:id="example.text">
     <p xml:id="example.p.1">
        <t><t-str id="example.p.1.str.1">Hello.</t-str> This is a sentence. Bye!</t>
        <t class="ocroutput"><t-str id="example.p.1.str.2">Hell0</t-str> Th1s iz a sentence, Bye1</t>

        <str xml:id="example.p.1.str.1">
            <t offset="0">Hello.</t>
            <relation>
                <xref id="example.p.1.str.2" type="str" />
            </relation>
        </str>

        <str xml:id="example.p.1.str.2">
            <t class="ocroutput" offset="0">Hell0</t>
            <relation>
                <xref id="example.p.1.str.1" type="str" />
            </relation>
        </str>
     </p>
  </text>
</FoLiA>