Metric Annotation

Metric Annotation is a form of higher-order annotation that allows annotation of some kind of measurement. The type of measurement is defined by the class, which in turn is defined by the set as always. The metric element has a value attribute that stores the actual measurement, the value is often numeric but this needs not be the case.

Specification

Annotation Category:
 

Higher-order Annotation

Declaration:

<metric-annotation set="..."> (note: set is optional for this annotation type; if you declare this annotation type to be setless you can not assign classes)

Version History:
 

since v0.9

Element:

<metric>

API Class:

Metric (FoLiApy API Reference)

Required Attributes:
 
Optional Attributes:
 
  • xml:id – The ID of the element; this has to be a unique in the entire document or collection of documents (corpus). All identifiers in FoLiA are of the XML NCName datatype, which roughly means it is a unique string that has to start with a letter (not a number or symbol), may contain numbers, but may never contain colons or spaces. FoLiA does not define any naming convention for IDs.
  • set – The set of the element, ideally a URI linking to a set definition (see Set Definitions (Vocabulary)) or otherwise a uniquely identifying string. The set must be referred to also in the Annotation Declarations for this annotation type.
  • class – The class of the annotation, i.e. the annotation tag in the vocabulary defined by set.
  • processor – This refers to the ID of a processor in the Provenance Data. The processor in turn defines exactly who or what was the annotator of the annotation.
  • annotator – This is an older alternative to the processor attribute, without support for full provenance. The annotator attribute simply refers to the name o ID of the system or human annotator that made the annotation.
  • annotatortype – This is an older alternative to the processor attribute, without support for full provenance. It is used together with annotator and specific the type of the annotator, either manual for human annotators or auto for automated systems.
  • confidence – A floating point value between zero and one; expresses the confidence the annotator places in his annotation.
  • datetime – The date and time when this annotation was recorded, the format is YYYY-MM-DDThh:mm:ss (note the literal T in the middle to separate date from time), as per the XSD Datetime data type.
  • n – A number in a sequence, corresponding to a number in the original document, for example chapter numbers, section numbers, list item numbers. This this not have to be an actual number but other sequence identifiers are also possible (think alphanumeric characters or roman numerals).
  • src – Points to a file or full URL of a sound or video file. This attribute is inheritable.
  • begintime – A timestamp in HH:MM:SS.MMM format, indicating the begin time of the speech. If a sound clip is specified (src); the timestamp refers to a location in the soundclip.
  • endtime – A timestamp in HH:MM:SS.MMM format, indicating the end time of the speech. If a sound clip is specified (src); the timestamp refers to a location in the soundclip.
  • speaker – A string identifying the speaker. This attribute is inheritable. Multiple speakers are not allowed, simply do not specify a speaker on a certain level if you are unable to link the speech to a specific (single) speaker.
Accepted Data:

<comment> (Comment Annotation), <desc> (Description Annotation)

Valid Context:

<chunk> (Chunking), <coreferencechain> (Coreference Annotation), <coreferencelink> (Coreference Annotation), <correction> (Correction Annotation), <current> (Correction Annotation), <def> (Definition Annotation), <dependency> (Dependency Annotation), <div> (Division Annotation), <domain> (Domain/topic Annotation), <entity> (Entity Annotation), <entry> (Entry Annotation), <errordetection> (Error Detection Annotation (DEPRECATED)), <event> (Event Annotation), <ex> (Example Annotation), <figure> (Figure Annotation), <gap> (Gap Annotation), <head> (Head Annotation), <hiddenw> (Hidden Token Annotation), <lang> (Language Annotation), <lemma> (Lemmatisation), <br> (Linebreak), <list> (List Annotation), <modality> (Modality Annotation), <morpheme> (Morphological Annotation), <new> (Correction Annotation), <note> (Note Annotation), <observation> (Observation Annotation), <original> (Correction Annotation), <p> (Paragraph Annotation), <part> (Part Annotation), <phoneme> (Phonological Annotation), <pos> (Part-of-Speech Annotation), <predicate> (Predicate Annotation), <quote> (Quote Annotation), <ref> (Reference Annotation), <relation> (Relation Annotation), <semrole> (Semantic Role Annotation), <sense> (Sense Annotation), <s> (Sentence Annotation), <sentiment> (Sentiment Annotation), <spanrelation> (Span Relation Annotation), <statement> (Statement Annotation), <str> (String Annotation), <subjectivity> (Subjectivity Annotation (DEPRECATED)), <suggestion> (Correction Annotation), <su> (Syntactic Annotation), <table> (Table Annotation), <term> (Term Annotation), <timesegment> (Time Segmentation), <utt> (Utterance Annotation), <whitespace> (Whitespace), <w> (Token Annotation)

Feature subsets (extra attributes):
 
  • value

Explanation

The <metric> element allows annotation of some kind of measurement. The type of measurement is defined by the class, which in turn is user-defined by the set as always. The metric element has a value attribute that stores the actual measurement, the value is often numeric but this needs not be the case. It is a higher-level annotation element that may be used with any kind of annotation.

An example of measurements associated with a word/token:

<w xml:id="example.p.1.s.1.w.2">
    <t>boot</t>
    <metric class="charlength" value="4" />
    <metric class="frequency" value="0.00232" />
</w>

The next example shows measurements associated with a span annotation element, in this case to add geolocation information:

<entity class="location">
    <wref id="w3" t="New" />
    <wref id="w4" t="York" />
    <metric class="latitude" value="40.71274" />
    <metric class="longitude" value="-74.005974" />
</entity>

The next example demonstrates a full FoLiA document with metric annotation on a Figure, but it may be more appropriate to use Submetadata for this instead:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
<?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" version="2.0" xml:id="example">
  <metadata>
      <annotations>
          <text-annotation>
			 <annotator processor="p1" />
          </text-annotation>
          <division-annotation set="https://raw.githubusercontent.com/LanguageMachines/uctodata/master/setdefinitions/divisions.foliaset.xml">
			 <annotator processor="p1" />
		  </division-annotation>
          <head-annotation>
			 <annotator processor="p1" />
		  </head-annotation>
          <figure-annotation>
			 <annotator processor="p1" />
		  </figure-annotation>
          <metric-annotation set="adhoc-figure">
			 <annotator processor="p1" />
		  </metric-annotation>
      </annotations>
      <provenance>
         <processor xml:id="p1" name="proycon" type="manual" />
      </provenance>
  </metadata>
  <text xml:id="example.text">
     <div xml:id="example.div.1" class="chapter" n="1">
         <head>
             <t>Frits Philips</t>
         </head>
         <figure xml:id="example.figure.1" n="1" src="https://upload.wikimedia.org/wikipedia/commons/f/f8/Standbeeld_Frits_Philips.jpg">
           <metric class="photographer" value="Robert de Greef" />
           <metric class="city" value="Eindhoven" />
           <metric class="depicted" value="Frits Philips" />
           <metric class="license" value="CC-BY-SA 3.0" />
           <caption><t>Standbeeld van Frits Philips in Eindhoven</t></caption>
         </figure>
     </div>
  </text>
</FoLiA>