Event Annotation

Structural annotation type representing events, often used in new media contexts for things such as tweets, chat messages and forum posts (as defined by a user-defined set definition). Note that a more linguistic kind of event annotation can be accomplished with Entity Annotation or even Time Segmentation rather than this one.

Specification

Annotation Category:
 

Structure Annotation

Declaration:

<event-annotation set="..."> (note: set is optional for this annotation type; if you declare this annotation type to be setless you can not assign classes)

Version History:
 

since v0.7

Element:

<event>

API Class:

Event (FoLiApy API Reference)

Required Attributes:
 
Optional Attributes:
 
  • xml:id – The ID of the element; this has to be a unique in the entire document or collection of documents (corpus). All identifiers in FoLiA are of the XML NCName datatype, which roughly means it is a unique string that has to start with a letter (not a number or symbol), may contain numbers, but may never contain colons or spaces. FoLiA does not define any naming convention for IDs.
  • set – The set of the element, ideally a URI linking to a set definition (see Set Definitions (Vocabulary)) or otherwise a uniquely identifying string. The set must be referred to also in the Annotation Declarations for this annotation type.
  • class – The class of the annotation, i.e. the annotation tag in the vocabulary defined by set.
  • processor – This refers to the ID of a processor in the Provenance Data. The processor in turn defines exactly who or what was the annotator of the annotation.
  • annotator – This is an older alternative to the processor attribute, without support for full provenance. The annotator attribute simply refers to the name o ID of the system or human annotator that made the annotation.
  • annotatortype – This is an older alternative to the processor attribute, without support for full provenance. It is used together with annotator and specific the type of the annotator, either manual for human annotators or auto for automated systems.
  • confidence – A floating point value between zero and one; expresses the confidence the annotator places in his annotation.
  • datetime – The date and time when this annotation was recorded, the format is YYYY-MM-DDThh:mm:ss (note the literal T in the middle to separate date from time), as per the XSD Datetime data type.
  • n – A number in a sequence, corresponding to a number in the original document, for example chapter numbers, section numbers, list item numbers. This this not have to be an actual number but other sequence identifiers are also possible (think alphanumeric characters or roman numerals).
  • space – This attribute indicates whether spacing should be inserted after this element (it’s default value is always yes, so it does not need to be specified in that case), but if tokens or other structural elements are glued together then the value should be set to no. This allows for reconstruction of the detokenised original text.
  • src – Points to a file or full URL of a sound or video file. This attribute is inheritable.
  • begintime – A timestamp in HH:MM:SS.MMM format, indicating the begin time of the speech. If a sound clip is specified (src); the timestamp refers to a location in the soundclip.
  • endtime – A timestamp in HH:MM:SS.MMM format, indicating the end time of the speech. If a sound clip is specified (src); the timestamp refers to a location in the soundclip.
  • speaker – A string identifying the speaker. This attribute is inheritable. Multiple speakers are not allowed, simply do not specify a speaker on a certain level if you are unable to link the speech to a specific (single) speaker.
  • tag – Contains a space separated list of processing tags associated with the element. A processing tag carries arbitrary user-defined information that may aid in processing a document. It may carry cues on how a specific tool should treat a specific element. The tag vocabulary is specific to the tool that processes the document. Tags carry no instrinsic meaning for the data representation and should not be used except to inform/aid processors in their task. Processors are encouraged to clean up the tags they use. Ideally, published FoLiA documents at the end of a processing pipeline carry no further tags. For encoding actual data, use class and optionally features instead.
Accepted Data:

<alt> (Alternative Annotation), <altlayers> (Alternative Annotation), <comment> (Comment Annotation), <correction> (Correction Annotation), <desc> (Description Annotation), <div> (Division Annotation), <entry> (Entry Annotation), <event> (Event Annotation), <ex> (Example Annotation), <external> (External Annotation), <figure> (Figure Annotation), <gap> (Gap Annotation), <head> (Head Annotation), <hiddenw> (Hidden Token Annotation), <br> (Linebreak), <list> (List Annotation), <metric> (Metric Annotation), <note> (Note Annotation), <p> (Paragraph Annotation), <part> (Part Annotation), <ph> (Phonetic Annotation/Content), <quote> (Quote Annotation), <ref> (Reference Annotation), <relation> (Relation Annotation), <s> (Sentence Annotation), <str> (String Annotation), <table> (Table Annotation), <t> (Text Annotation), <utt> (Utterance Annotation), <whitespace> (Vertical Whitespace), <w> (Token Annotation)

Valid Context:

<div> (Division Annotation), <event> (Event Annotation), <head> (Head Annotation), <list> (List Annotation), <p> (Paragraph Annotation), <s> (Sentence Annotation), <term> (Term Annotation)

Feature subsets (extra attributes):
 
  • actor
  • begindatetime
  • enddatetime

Explanation

Event structure, though uncommon to regular written text, can be useful in certain documents. Divisions, paragraphs, sentences, or even words can be encapsulated in an event element to indicate they somehow form an event entity of a particular class. This kind of structure annotation is especially useful in dealing with computer-mediated communication such as chat logs, tweets, and internet fora, in which chat turns, forum posts, and tweets can be demarcated as particular events.

The event class predefines some feature subsets you can use (you can use these as XML attributes, see Features for more information on features); the subsets begindatetime and enddatetime can be used express to the exact moment at which an event started or ended. Note that this differs from the common datetime attribute, which would describe the time at which the annotation was recorded, rather than when the event took place! The actor subset is used to associate the person responsible for the event, i.e. the speaker or poster.

For more fine-grained control over timed events, for example within sentences. It is recommended to use Time Segmentation!

Example

The following example shows a chat log composed of message events:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
<?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" version="2.0" xml:id="example">
  <metadata>
      <annotations>
          <text-annotation>
			 <annotator processor="p1" />
          </text-annotation>
          <event-annotation set="adhoc">
			 <annotator processor="p1" />
		  </event-annotation>
          <sentence-annotation>
			 <annotator processor="p1" />
		  </sentence-annotation>
      </annotations>
      <provenance>
         <processor xml:id="p1" name="proycon" type="manual" />
      </provenance>
  </metadata>
  <text xml:id="example.text">
    <event class="message" begindatetime="2011-12-15T19:01"
     enddatetime="2011-12-15T19:05" actor="Jane Doe">
        <s>
            <t>Hello John.</t>
        </s>
        <s>
            <t>How are you doing?</t>
        </s>
    </event>
    <event class="message" begindatetime="2011-12-15T19:06"
     actor="John Doe">
        <s>
            <t>I am fine Jane, thanks.</t>
        </s>
    </event>
  </text>
</FoLiA>