Coreference Annotation

Relations between words that refer to the same referent (anaphora) are expressed in FoLiA using Coreference Annotation. The co-reference relations are expressed by specifying the entire chain in which all links are coreferent.

Specification

Annotation Category:
 

Span Annotation

Declaration:

<coreference-annotation set="..."> (note: set is optional for this annotation type; if you declare this annotation type to be setless you can not assign classes)

Version History:
 

since v0.9

Element:

<coreferencechain>

API Class:

CoreferenceChain (FoLiApy API Reference)

Layer Element:

<coreferences>

Span Role Elements:
 

<coreferencelink> (CoreferenceLink)

Required Attributes:
 
Optional Attributes:
 
  • xml:id – The ID of the element; this has to be a unique in the entire document or collection of documents (corpus). All identifiers in FoLiA are of the XML NCName datatype, which roughly means it is a unique string that has to start with a letter (not a number or symbol), may contain numbers, but may never contain colons or spaces. FoLiA does not define any naming convention for IDs.
  • set – The set of the element, ideally a URI linking to a set definition (see Set Definitions (Vocabulary)) or otherwise a uniquely identifying string. The set must be referred to also in the Annotation Declarations for this annotation type.
  • class – The class of the annotation, i.e. the annotation tag in the vocabulary defined by set.
  • processor – This refers to the ID of a processor in the Provenance Data. The processor in turn defines exactly who or what was the annotator of the annotation.
  • annotator – This is an older alternative to the processor attribute, without support for full provenance. The annotator attribute simply refers to the name o ID of the system or human annotator that made the annotation.
  • annotatortype – This is an older alternative to the processor attribute, without support for full provenance. It is used together with annotator and specific the type of the annotator, either manual for human annotators or auto for automated systems.
  • confidence – A floating point value between zero and one; expresses the confidence the annotator places in his annotation.
  • datetime – The date and time when this annotation was recorded, the format is YYYY-MM-DDThh:mm:ss (note the literal T in the middle to separate date from time), as per the XSD Datetime data type.
  • n – A number in a sequence, corresponding to a number in the original document, for example chapter numbers, section numbers, list item numbers. This this not have to be an actual number but other sequence identifiers are also possible (think alphanumeric characters or roman numerals).
  • textclass – Refers to the text class this annotation is based on. This is an advanced attribute, if not specified, it defaults to current. See Text class attribute (advanced).
  • src – Points to a file or full URL of a sound or video file. This attribute is inheritable.
  • begintime – A timestamp in HH:MM:SS.MMM format, indicating the begin time of the speech. If a sound clip is specified (src); the timestamp refers to a location in the soundclip.
  • endtime – A timestamp in HH:MM:SS.MMM format, indicating the end time of the speech. If a sound clip is specified (src); the timestamp refers to a location in the soundclip.
  • speaker – A string identifying the speaker. This attribute is inheritable. Multiple speakers are not allowed, simply do not specify a speaker on a certain level if you are unable to link the speech to a specific (single) speaker.
Accepted Data:

<comment> (Comment Annotation), <coreferencelink> (Coreference Annotation), <desc> (Description Annotation), <metric> (Metric Annotation), <relation> (Relation Annotation)

Valid Context:

<coreferences> (Coreference Annotation)

Explanation

Note

Please first ensure you are familiar with the general principles of Span Annotation to make sense of this annotation type.

Relations between words that refer to the same referent are expressed in FoLiA using the <coreferencechain> span annotation element and the <coreferencelink> span role within it for each instance.

The co-reference relations are expressed by specifying the entire chain in which all links are coreferent. The head of a coreferent may optionally be marked with the <hd> element, another span role.

As always, this annotation layer itself may be embedded on whatever level is preferred. The following example uses paragraph level, but you can for instance also embed it at sentence level or a global text level:

The coreferencelink may take three attributes, which are actually predefined feature subsets (See Features), their values depend on the set used and are thus user-definable and never predefined:

  • mod - A subset that can be used to indicate that there is modality or negation in this coreference link.
  • time - A subset used to indicate a time dependency. An example of a time dependency is seen in the sentence: “Bert De Graeve, until recently CEO, will now take up a position as CFO”. Here

“Bert De Graeve”, “CEO” and “CFO” would all be part of the same coreference chain, and the second coreferencelink (“CEO”) can be marked as being in the past using the “time” attribute. * level - A subset used that can indicate the level on which the coreference holds. A possible value suggestion could be sense, indicating that only on sense-level there is a coreference relation, as opposed to an actual reference.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
<?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" version="2.0" xml:id="example">
  <metadata>
      <annotations>
          <token-annotation set="https://raw.githubusercontent.com/LanguageMachines/uctodata/master/setdefinitions/tokconfig-eng.foliaset.ttl">
			 <annotator processor="p1" />
		  </token-annotation>
          <text-annotation>
			 <annotator processor="p1" />
          </text-annotation>
          <sentence-annotation>
			 <annotator processor="p1" />
          </sentence-annotation>
          <paragraph-annotation>
			 <annotator processor="p1" />
          </paragraph-annotation>
          <coreference-annotation set="adhoc"> <!-- an ad-hoc set -->
			 <annotator processor="p1" />
		  </coreference-annotation>
      </annotations>
      <provenance>
         <processor xml:id="p1" name="proycon" type="manual" />
      </provenance>
  </metadata>
  <text xml:id="example.text">
    <p xml:id="example.p.1">
        <s xml:id="example.p.1.s.1">
          <t>The Dalai Lama greeted him.</t>
          <w xml:id="example.p.1.s.1.w.1"><t>The</t></w>
          <w xml:id="example.p.1.s.1.w.2"><t>Dalai</t></w>
          <w xml:id="example.p.1.s.1.w.3"><t>Lama</t></w>
          <w xml:id="example.p.1.s.1.w.4"><t>greeted</t></w>
          <w xml:id="example.p.1.s.1.w.5" space="no"><t>him</t></w>
          <w xml:id="example.p.1.s.1.w.6"><t>.</t></w>
        </s>
        <s xml:id="example.p.1.s.2">
          <t>He was happy to see him.</t>
          <w xml:id="example.p.1.s.2.w.1"><t>He</t></w>
          <w xml:id="example.p.1.s.2.w.2"><t>was</t></w>
          <w xml:id="example.p.1.s.2.w.3"><t>happy</t></w>
          <w xml:id="example.p.1.s.2.w.4"><t>to</t></w>
          <w xml:id="example.p.1.s.2.w.5"><t>see</t></w>
          <w xml:id="example.p.1.s.2.w.6" space="no"><t>him</t></w>
          <w xml:id="example.p.1.s.2.w.7"><t>.</t></w>
        </s>
        <s xml:id="example.p.1.s.3">
          <t>He smiled.</t>
          <w xml:id="example.p.1.s.3.w.1"><t>He</t></w>
          <w xml:id="example.p.1.s.3.w.2" space="no"><t>smiled</t></w>
          <w xml:id="example.p.1.s.3.w.3"><t>.</t></w>
        </s>
        <coreferences>
            <coreferencechain class="dalailama">
              <coreferencelink>
                  <wref id="example.p.1.s.1.w.1" t="The" />
                  <hd> <!-- extra span role to mark the head -->
                    <wref id="example.p.1.s.1.w.2" t="Dalai" />
                    <wref id="example.p.1.s.1.w.3" t="Lama" />
                  </hd>
              </coreferencelink>
              <coreferencelink>
                <wref id="example.p.1.s.2.w.1" t="he" />
              </coreferencelink>
            </coreferencechain>
            <coreferencechain class="dalailama">
              <coreferencelink>
                <wref id="example.p.1.s.1.w.5" t="him" />
              </coreferencelink>
              <coreferencelink>
                <wref id="example.p.1.s.2.w.6" t="him" />
              </coreferencelink>
              <coreferencelink>
                <wref id="example.p.1.s.3.w.1" t="He" />
              </coreferencelink>
            </coreferencechain>
        </coreferences>
    </p>
  </text>
</FoLiA>