Span Annotation

This category encompasses (linguistic) annotation types that span one or more structural elements. Examples are (Named) Entities or Multi-word Expressions, Dependency Relations, and many others. FoLiA implements these as a stand-off layer that refers back to the structural elements (often words/tokens). The layer itself is embedded in a structural level of a wider scope (such as a sentence).

Span annotation elements are always embedded in a layer element, this is an element that groups span annotations of a particular annotation type and set together. Each annotation type has its own layer element and the layer elements themselves are embedded, inline, in a structural element. So, say you want to do named entity annotation (a form of span annotation) over words, then after you defined the words, you can embed a layer (<entities>) containing the span annotation elements (<entity> in this example), which refer back to the words. Such a reference back is done with the wref element.

Consider the following example:

<s xml:id="example.p.1.s.1">
  <t>The Dalai Lama greeted him.</t>
  <w xml:id="example.p.1.s.1.w.1"><t>The</t></w>
  <w xml:id="example.p.1.s.1.w.2"><t>Dalai</t></w>
  <w xml:id="example.p.1.s.1.w.3"><t>Lama</t></w>
  <w xml:id="example.p.1.s.1.w.4"><t>greeted</t></w>
  <w xml:id="example.p.1.s.1.w.5"><t>him</t></w>
  <w xml:id="example.p.1.s.1.w.6"><t>.</t></w>
  <entities>
    <entity xml:id="example.p.1.s.1.entity.1" class="per">
        <wref id="example.p.1.s.1.w.2" t="Dalai" />
        <wref id="example.p.1.s.1.w.3" t="Lama" />
    </entity>
  </entities>
</s>

The next sentence may in turn have an <entities> layer as well. The design principle behind this is to keep information, even when it concerns span annotations, as local as possible rather than spread out of the document. This facilitates the job for streaming parsers and humans looking at the raw XML. Nevertheless, this is a convention which most FoLiA libraries adhere to, but is not a strict requirement. So it is still possible and valid to place your layer at any higher structural level, as long as all the elements you refer to are within its scope and all defined prior to the layer itself.

Note

As you might have seen, the wref element may carry a t attribute with the text of word/structure it refers to. This redundancy is merely to provide extra clarity to the person inspecting the XML and is not mandatory.

Note

The wref elements refers to words/tokens or sub-token annotations such as morphemes and phonemes. We do not use it to refer to higher-level structural elements!

Note

The order of the references should always correspond to the order of the tokens in the text. However, the references need not be strictly continuous; there may be gaps.

Depending on the type of span annotation, it is possible that the element may be nested. This is for example the case for Syntactic Annotation, where the nesting of syntactic units allows the building of syntax trees. Span annotation elements of a more complex nature may require or allow so-called span role elements. Span roles encapsulate references to the words and ascribe a more defined meaning to the span, for instance to mark the head or dependent in a dependency relation. Span role elements themselves never carry any classes and can only be used in the scope of a certain span annotation element, not standalone. They can still carry Features, though.

FoLiA defines the following types of span annotation:

  • Span Annotation – This category encompasses (linguistic) annotation types that span one or more structural elements. Examples are (Named) Entities or Multi-word Expressions, Dependency Relations, and many others. FoLiA implements these as a stand-off layer that refers back to the structural elements (often words/tokens). The layer itself is embedded in a structural level of a wider scope (such as a sentence).

    • Syntactic Annotation<su> – Assign grammatical categories to spans of words. Syntactic units are nestable and allow representation of complete syntax trees that are usually the result of consistuency parsing.
    • Chunking<chunk> – Assigns shallow grammatical categories to spans of words. Unlike syntax annotation, chunks are not nestable. They are often produced by a process called Shallow Parsing, or alternatively, chunking.
    • Entity Annotation<entity> – Entity annotation is a broad and common category in FoLiA. It is used for specifying all kinds of multi-word expressions, including but not limited to named entities. The set definition used determines the vocabulary and therefore the precise nature of the entity annotation.
    • Dependency Annotation<dependency> – Dependency relations are syntactic relations between spans of tokens. A dependency relation takes a particular class and consists of a single head component and a single dependent component.
    • Time Segmentation<timesegment> – FoLiA supports time segmentation to allow for more fine-grained control of timing information by associating spans of words/tokens with exact timestamps. It can provide a more linguistic alternative to Event Annotation.
    • Coreference Annotation<coreferencechain> – Relations between words that refer to the same referent (anaphora) are expressed in FoLiA using Coreference Annotation. The co-reference relations are expressed by specifying the entire chain in which all links are coreferent.
    • Semantic Role Annotation<semrole> – This span annotation type allows for the expression of semantic roles, or thematic roles. It is often used together with Predicate Annotation
    • Predicate Annotation<predicate> – Allows annotation of predicates, this annotation type is usually used together with Semantic Role Annotation. The types of predicates are defined by a user-defined set definition.
    • Observation Annotation<observation> – Observation annotation is used to make an observation pertaining to one or more word tokens. Observations offer a an external qualification on part of a text. The qualification is expressed by the class, in turn defined by a set. The precise semantics of the observation depends on the user-defined set.
    • Sentiment Annotation<sentiment> – Sentiment analysis marks subjective information such as sentiments or attitudes expressed in text. The sentiments/attitudes are defined by a user-defined set definition.
    • Statement Annotation<statement> – Statement annotation, sometimes also refered to as attribution, allows to decompose statements into the source of the statement, the content of the statement, and the way these relate, provided these are made explicit in the text.
    • Modality Annotation<modality> – Modality annotation is used to describe the relationship between cue word(s) and the scope it covers. It is primarily used for the annotation of negation, but also for the annotation of factuality, certainty and truthfulness:.

Group Annotations: Inline Annotations on Span Annotations

It is possible to directly apply inline annotations (see Inline Annotation) to span annotations, which allows for example to assign a part-of-speech tag or lemma directly to an entity, rather than to a word (<w>) as is customary. This functionality, however, needs to be explicitly enabled by adding the groupannotations=yes attribute to the declaration, as it adds extra complexity to a FoLiA document and in this way informs parsers to be aware of this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
<?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" version="2.0" xml:id="example">
  <metadata>
      <annotations>
          <token-annotation set="https://raw.githubusercontent.com/LanguageMachines/uctodata/master/setdefinitions/tokconfig-eng.foliaset.ttl">
			 <annotator processor="p1" />
		  </token-annotation>
          <text-annotation>
			 <annotator processor="p1" />
          </text-annotation>
          <sentence-annotation>
			 <annotator processor="p1" />
          </sentence-annotation>
          <paragraph-annotation>
			 <annotator processor="p1" />
          </paragraph-annotation>
          <entity-annotation groupannotations="yes">
			 <annotator processor="p1" />
		  </entity-annotation>
          <pos-annotation set="brown"> <!-- This is an ad-hoc set declaration as it is no URL and therefore not really defined -->
			 <annotator processor="p1" />
          </pos-annotation>
          <lemma-annotation set="english-adhoc"> <!-- This is an ad-hoc set declaration as it is no URL and therefore not really defined -->
			 <annotator processor="p1" />
          </lemma-annotation>
      </annotations>
      <provenance>
         <processor xml:id="p1" name="proycon" type="manual" />
      </provenance>
  </metadata>
  <text xml:id="example.text">
    <p xml:id="example.p.1">
      <s xml:id="example.p.1.s.1">
         <t>The container-ship lost its cargo of bottle openers.</t>
         <w xml:id="example.p.1.s.1.w.1" class="WORD">
            <t>The</t>
            <pos class="AT" />
         </w>
         <w xml:id="example.p.1.s.1.w.2" class="WORD" space="no">
            <t>container</t>
         </w>
         <w xml:id="example.p.1.s.1.w.3" class="WORD" space="no">
            <t>-</t>
         </w>
         <w xml:id="example.p.1.s.1.w.4" class="WORD">
            <t>ship</t>
         </w>
         <w xml:id="example.p.1.s.1.w.5" class="WORD">
            <t>lost</t>
            <pos class="VBD" />
         </w>
         <w xml:id="example.p.1.s.1.w.6" class="WORD">
            <t>its</t>
            <pos class="PP$" />
         </w>
         <w xml:id="example.p.1.s.1.w.7" class="WORD">
            <t>cargo</t>
            <pos class="NN" />
         </w>
         <w xml:id="example.p.1.s.1.w.8" class="WORD">
             <t>of</t>
            <pos class="IN" />
         </w>
         <w xml:id="example.p.1.s.1.w.9" class="WORD">
            <t>bottle</t>
         </w>
         <w xml:id="example.p.1.s.1.w.10" class="WORD" space="no">
            <t>openers</t>
         </w>
         <w xml:id="example.p.1.s.1.w.11" class="PUNCTUATION">
            <t>.</t>
         </w>
         <entities>
             <entity xml:id="example.p.1.s.1.entity.1">
                 <wref id="example.p.1.s.1.w.2" t="container" />
                 <wref id="example.p.1.s.1.w.3" t="-" />
                 <wref id="example.p.1.s.1.w.4" t="ship" />
                 <pos class="NN" />
                 <lemma class="container-ship" />
             </entity>
             <entity xml:id="example.p.1.s.1.entity.2">
                 <wref id="example.p.1.s.1.w.9" t="bottle" />
                 <wref id="example.p.1.s.1.w.10" t="openers" />
                 <pos class="NNS" />
                 <lemma class="bottle opener" />
             </entity>
         </entities>
      </s>
    </p>
  </text>
</FoLiA>