Metadata

FoLiA supports associating metadata with your document, you will find this in the <metadata> document at the very beginning of the document. An extensive and mandatory part of this metadata is the Annotation Declarations block (<annotations>), and second (optionally) the block for Provenance Data (<provenance>). The remainder of the <metadata> block may be filled with Document Metadata as described in Document Metadata later. on.

Annotation Declarations

All annotation types that are used in a FoLiA document have to be declared. In the metadata block you will find the <annotations> block in which each annotation type that occurs in the document is mentioned, i.e. declared. So does your document include Part of Speech tagging? Then there will be an entry declaring it does so, and linking to the set definition used.

These declarations allow software to identify exactly what a FoLiA document consists of without needing to go through the entire document, on the basis of this software can determine whether it can handle the document in the first place. You can for instance imagine an NLP tool that does Named Entity Recognition but requires Part-of-Speech tags and Lemmas to do so, feeding it a FoLiA document without such annotation layers would then be pointless and easy to detect.

Each annotation type has a specific XML element to use as declaration in the <annotations> block, these all end in with the suffix -annotations and take the following attributes:

  • set - The set should be a URL to a publicly hosted set definition that defines the vocabulary used (see Set Definitions (Vocabulary)) with this particular annotation type. Sets are intentionally kept separate from FoLiA itself and can be created by anyone. FoLiA also allows for ad-hoc sets, these are sets that are not actually defined and they are therefore an arbitrary string rather than a URL. They allow for a more flexible use of FoLiA without full formal closure, but limit it to only shallow validation. Some annotation types also work without an associated vocabulary, and for some they are optional, on such declarations the set attribute is not used or optional.
  • format - Set definitions can be stored in several formats, the format may be indicated (not mandatory) by the format attribute on each declaration, its value should be a MIME type.
  • alias - This is an optional attribute that specifies an alias for the set, this is useful in case an annotation type occurs multiple times with distinct sets, in which case individual annotation needs to explicitly mention the set but referring to sets by long URLs gets cumbersome. In such cases annotations can use the alias instead of the full set URL. An alias has to be unique for the annotation type.

Within the scope of each annotation’s declaration, you can declare one or more annotators, each annotator refers (by ID) to what we call a processor in the provenance data. These processors represent software tools or human annotators and carry various attributes, e.g. the name of the annotator/tool. So this part of declaration identifies who or what performed the annotation. Consider the following example:

<annotations>
    <token-annotation set="https://raw.githubusercontent.com/LanguageMachines/uctodata/master/setdefinitions/tokconfig-eng.foliaset.ttl">
        <annotator processor="p1.ucto"/>
    </token>
    <pos-annotation set="https://github.com/proycon/folia/blob/master/setdefinitions/cgn.foliaset.ttl">
        <annotator processor="p2.frog"/>
        <annotator processor="p3.proycon"/>
    </pos>
</annotations>

The section Provenance Data will explain in depth how the processors that the annotator elements refer to are defined. If there is only one annotator declared, then this is the default for all annotations of this type and set, in which case individual annotation instances need not refer to any processor. If there are multiple annotators, the individual annotation instances should refer to a processor to disambiguate.

Provenance data is recommended but not required in FoLiA. A simpler mechanism from prior to FoLiA v2.0 is also still available: If you do not refer to processors for a certain annotation type and set (i.e. no <annotator> elements), then you can specify the following optional attributes on your declaration to set a default annotator. They act as a default value that can be overriden on individual annotations:

  • annotator - The name of the default annotator (either human or software)
  • annotatortype - Set to auto if the default annotator is automatic annotation by software or manual for human annotators
  • datetime – The date and time when all annotations of this type were recorded, the format is YYYY-MM-DDThh:mm:ss (note the literal T in the middle to separate date from time), as per the XSD Datetime data type.

Set definitions

A set definition (see Set Definitions (Vocabulary)) specifies exactly what classes are allowed in a particular vocabulary. It for example specifies exactly what part-of-speech tags exist. This information is necessary to validate the document completely at its deepest level. If the sets point to URLs that do not exist or are not URLs at all, warnings will be issued. Validation can still proceed but with the notable exception that there is no deep validation of these sets, so no full formal closure.

Though we recommend using and creating actual sets. FoLiA itself is rather agnostic about their existence for most purposes. For deep validation, proper formalisation, and for certain applications they may be required; but as long as they serve as proper unique identifiers you can get get away with non-existing sets. In this case, simply do not use a URL but another arbitrary identification string.

If multiple sets are used for the same annotation type, which is perfectly valid, they each need a separate declaration.

Document Metadata

To associate other arbitrary metadata with a FoLiA document, there is FoLiA’s native metadata system, in which simple metadata fields can be defined and used at will through the <meta> element. The following example shows document-wide metadata:

<metadata type="native">
    <annotations>
    ..
    </annotations>
    <meta id="title">Title of my document</meta>
    <meta id="language">eng</meta>
</metadata>

The native metadata format just offers a simple key-value store. You can define fields with custom IDs. FoLiA itself does not predefine any, strictly speaking, although certain fields like language, title and author are conventional and can be interpreted by some FoLiA-capable tools and libraries.

The native metadata format is deliberately limited, as various other formats already tackle the metadata issue. FoLiA is able to operate with any other metadata format, such as for example Dublin Core or for example CMDI. The type attribute specifies what metadata format is used. We see it was set to native for FoLiA’s native metadata format, for foreign formats it can be set to any other string.

Foreign metadata can be stored in two ways:

  • Externally in a different file
  • Internally in the metadata block of the FoLiA document itself

When the metadata is stored externally in a different file, a reference is made from the src attribute. As shown in the following example:

<metadata type="cmdi" src="/path/or/url/to/metadata.cmdi">
  <annotations>
  ..
  </annotations>
</metadata>

If you want to store the metadata in the FoLiA document itself, then the metadata must be places inside a <foreign-data> element. All elements under foreign-data must be in another XML namespace, that is, not the default FoLiA namespace. Consider the following example for Dublin Core:

<metadata type="dc">
  <annotations>
  ..
  </annotations>
  <foreign-data xmlns:dc="http://purl.org/dc/elements/1.1/">
    <dc:identifier>mydoc</dc:identifier>
    <dc:format>text/xml</dc:format>
    <dc:type>Example</dc:type>
    <dc:contributor>proycon</dc:contributor>
    <dc:creator>proycon</dc:creator>
    <dc:language>en</dc:language>
    <dc:publisher>Radboud University</dc:publisher>
    <dc:rights>public Domain</dc:rights>
  </foreign-data>
</metadata>

The namespace prefix and the type specified in the <metadata> element should match.

Submetadata

Whereas the metadata discussed in the previous section concerns document-wide metadata, i.e. metadata that is applicable to the document as a whole, FoLiA also supports metadata on arbitrary parts of the document. This we call submetadata. Within the <metadata> block, one can include one or more <submetadata> blocks. Like <metadata>, a <submetadata> block carries a type attribute, a src attribute in case the metadata is an external reference, and it may hold <meta> elements or <foreign-data> elements. It differs from <metadata> in that it carries a mandatory xml:id attribute and never has an <annotations> or <provenance> block. The ID is in turn used to back to the metadata from particular elements in the text. Such a reference is made using the metadata attribute, which is a common FoLiA attribute allowed on many elements. Consider the following example (certain details are omitted for brevity):

<FoLiA>
<metadata>
 <annotations>...</annotations>
 <submetadata xml:id="metadata.1" type="native">
   <meta id="author">proycon</meta>
   <meta id="language">nld</meta>
 </submetadata>
 <submetadata xml:id="metadata.2" type="native">
   <meta id="author">Shakespeare</meta>
   <meta id="language">eng</meta>
 </submetadata>
</metadata>
<text>
 <p metadata="metadata.1">
   <t>Het volgende vers komt uit Hamlet:</t>
 </p>
 <p metadata="metadata.2">
  <s><t>To be, or not to be, that is the question:</t></s>
  <s><t>Whether 'tis nobler in the mind to suffer<br/>The slings and arrows of outrageous fortune,<br/>Or to take Arms against a Sea of troubles,<br/> And by opposing end them:</s></t>
 </p>
</text>
</FoLiA>

Since metadata can be associated with anything, any arbitrary sub-part of untokenised text can even be selected and associated with the existing facilities <str> or t-str (see String Annotation). Some redundancy occurs only at places where structural boundaries are crossed (the metadata attribute might have to be repeated on multiple structural elements if there is no catch-all structure).

Submetadata is inherited (recursively), i.e. it is not necessary to explicitly assign the metadata attribute to the children of an element that already has such an assignment.

Provenance Data

It is often desireable to know exactly what tools (and what versions thereof and even with what parameters) were invoked in which order to produce a FoLiA document, this is called provenance data. In the metadata section, right after the Annotation Declarations FoLiA allows for a <provenance> block containing this information. It is not mandatory but it is strongly recommended.

The <provenance> block defines one or more processors, processors are processes or entities that have processed and often performend some kind of manipulation of the document, such as adding annotations. The processors are listed in the order they were invoked. The Annotation Declarations in turn link to these processors to tie a particular annotation type and set to one or more processors.

A <processor> carries the following attributes:
  • xml:id (mandatory) – The ID of the processor, this is how it is referred to from the <annotator processor=".." /> element in the Annotation Declarations and from the processor attribute (part of the common FoLiA attributes) on individual annotations.
  • name (mandatory) – The name identifies actual tool or human annotator
  • type – Each processor contains a type:
    • auto - (default) - The processor is an automated tool that provided annotations
    • manual - The processor refers a manual annotator
    • generator - The processor indicates the FoLiA library used by the parent and sibling processors (unless sibling processes specify another generator in their scope)
    • datasource - The processor is a reference to a particular data source that was used by the parent processor. If there is no parent processor but it is instead directly part of the provenance chain, often as the very first element, then you can interpret this to be the original data source from which the document sprung.
  • version – (optional but strongly recommended) is the version of the processor aka tool
  • document_version (optional) – The version of the document, refers to any label the user desires to indicate a version of the document, so the format is not predetermined and needs not be numeric.
  • command (optional) – The exact command that was run
  • host (optional) – The host on which the processor ran, this identifies individual systems on a network/cluster.
  • user (optional) – The user/executor which ran the processor, this identifies who ran an automated process rather than who the annotator was!
  • src (optional) – The source of the processor, a URL to the tool itself in case the software is an online tool, or to its website or source code repository if not. If the processor is of the datasource type, then this attribute should point to that data set or a website describing it. The format attribute can be used to further specify the type of source.
  • format (optional) – MIME type describing the kind of resource pointed to by src. Use text/html for websites. Especially useful for processors of type datasource.
  • folia_version (optional) - The folia version that was written
  • begindatetime (optional) – Specifies when the process started, format is YYYY-MM-DDThh:mm:ss (note the literal T in the middle to separate date from time), as per the XSD Datetime data type.
  • enddatetime (optional) – Specifies when the process finished, format is YYYY-MM-DDThh:mm:ss (note the literal T in the middle to separate date from time), as per the XSD Datetime data type.
  • resourcelink (optional) - The URI of any RDF resource describing this processor. This allows linking to the external world of linked open data from the provenance chain in FoLiA.
  • Additional custom metadata is allowed in the form of <meta> elements (just like with folia native metadata) inside the scope of a processor, FoLiA does not define the semantics of any such metadata, i.e. they are tool/application-specific and could for instance be used to specify tool parameters used.

First consider a fairly minimalistic example, note that we include the Annotation Declarations as well with a link to the processor:

<annotations>
  <token-annotation set="tokconfig-nl">
      <annotator processor="p0" />
  </token-annotation>
</annotations>
<provenance>
    <processor xml:id="p0" name="ucto" version="0.15" folia_version="2.0" command="ucto -Lnld" host="mhysa" user="proycon" begindatetime="2018-09-12T00:00:00" enddatetime="2018-09-12T00:00:10" document_version="1" />
</provenance>

Individual annotations in the document can refer to this processor using the processor attribute:

<w class="PUNCTUATION" processor="p0">
 <t>.</t>
</w>

If there is only one <annotator> defined for a certain annotation type and set in the Annotation Declarations, then it is the default and no processor attribute is necessary.

One of the powerful features of processors is that they can be nested, this creates subprocessors and captures situations where one processor invokes others as part of its operation. Subprocessors can also provide some extra information on their parent processor, as they can for example state what FoLiA library was used (type="generator") or what data sources were used by the processor (type="datasource"). Moreover, arbitrary metadata can be added to any processor in the form of <meta> elements (just like with FoLiA’s native Metadata), FoLiA does not define the semantics of any such metadata, i.e. they are tool/application-specific and could for instance be used to specify tool parameters used. Note that whereas the order of the processors in the <provenance> block is strictly significant, the order of subprocessors is not.

With all this in mind, we can expand our previous example:

<provenance>
    <processor xml:id="p0" name="ucto" version="0.15" folia_version="2.0" command="ucto -Lnld" host="mhysa" user="proycon" begindatetime="2018-09-12T00:00:00" enddatetime="2018-09-12T00:00:10" document_version="1" />
        <meta id="config">tokconfig-nld</meta>
        <meta id="language">nld</meta>
        <processor xml:id="p0.1" name="libfolia" version="2.0" folia_version="2.0" type="generator" />
        <processor xml:id="p0.1" name="tokconfig-nld" version="2.0" folia_version="2.0" type="datasource" />
    </processor>
</provenance>

Or consider the following example in which we have a tool that is an annotation environment in which human annotators edit a FoLiA document and add/edit annotations:

<provenance>
    <processor xml:id="p2" name="flat" version="0.8" folia_version="2.0" host="flat.science.ru.nl" begindatetime="2018-09-12T00:10:00" enddatetime="2018-09-12T00:20:00" document_version="3">
        <processor xml:id="p2.0" name="foliapy" version="2.0" folia_version="2.0" type="generator" />
        <processor xml:id="p2.1" name="proycon" type="manual" />
        <processor xml:id="p2.2" name="ko" type="manual" />
    </processor>
</provenance>

From the Annotation Declarations, we can then also refer directly to subprocessors. Moreover, a processor can be referred to from multiple annotation types/sets:

<annotations>
  ...
  <pos-annotation set="...">
      <annotator processor="p2.1" />
      <annotator processor="p2.2" />
  </pos-annotation>
  <lemma-annotation set="...">
      <annotator processor="p2.1" />
  </lemma-annotation>
  ...
</annotations>

Of course, providing all this is not mandatory and requires the specific tool to actually supply this provenance data. It is still possible to have FoLiA documents without provenance data at all.

The following example provides a small but complete FoLiA document with provenance data:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="folia.xsl"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="untitled" generator="manual" version="2.0.0">
  <metadata type="native">
    <annotations>
      <text-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl"/>
      <paragraph-annotation />
      <sentence-annotation />
      <token-annotation set="tokconfig-nl">
          <annotator processor="p0" />
      </token-annotation>
      <pos-annotation set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn">
          <!-- There are multiple annotators, this means that each pos annotation should explicitly refer to one of them using the @processor attribute -->
          <annotator processor="p1.1" />
          <annotator processor="p2.1" />
          <annotator processor="p2.2" />
      </pos-annotation>
      <lemma-annotation set="http://ilk.uvt.nl/folia/sets/frog-mblem-nl">
          <!-- There is only one annotator so this will be the default, no need to explicitly refer to it from lemma annotations using the @processor attribute -->
          <annotator processor="p1.2" />
      </lemma-annotation>
    </annotations>
    <provenance>
        <processor xml:id="p0" name="ucto" version="0.15" folia_version="2.0" command="ucto -Lnld" host="mhysa" user="proycon" src="https://github.com/LanguageMachines/ucto" begindatetime="2018-09-12T00:00:00" enddatetime="2018-09-12T00:00:00" document_version="1">
            <!-- We can add arbitrary meta fields to any processor, they are not defined by FoLiA but application-specific  -->
            <meta id="config">tokconfig-nld</meta>
            <meta id="language">nld</meta>
            <processor xml:id="p0.1" name="libfolia" version="2.0" folia_version="2.0" type="generator" />
        </processor>
        <processor xml:id="p1" name="frog" version="0.16" folia_version="2.0" command="frog --skip=pn" host="mhysa" user="proycon" src="https://github.com/LanguageMachines/frog" begindatetime="2018-09-12T00:01:00" enddatetime="2018-09-12T00:02:00" document_version="2">
            <processor xml:id="p1.0" name="libfolia" version="2.0" folia_version="2.0" type="generator" />
            <processor xml:id="p1.1" name="mbpos" version="0.16">
                  <processor xml:id="p1.1.1" type="datasource" name="CGN Corpus" version="unknown" />
                  <processor xml:id="p1.1.2" type="datasource" name="WOTAN Corpus" version="unknown" />
                  <processor xml:id="p1.1.3" type="datasource" name="DCOI Corpus" version="unknown" />
                  <processor xml:id="p1.1.4" type="datasource" name="Lassy Klein Corpus" version="unknown" />
            </processor>
            <processor xml:id="p1.2" name="mblem" />
        </processor>
        <processor xml:id="p2" name="flat" version="0.8" folia_version="2.0" host="flat.science.ru.nl" src="https://flat.science.ru.nl" begindatetime="2018-09-12T00:10:00" enddatetime="2018-09-12T00:20:00" document_version="3">
            <processor xml:id="p2.0" name="foliapy" version="2.0" folia_version="2.0" type="generator" src="https://github.com/proycon/foliapy" />
            <processor xml:id="p2.1" name="proycon" type="manual" />
            <processor xml:id="p2.2" name="ko" type="manual" />
        </processor>
    </provenance>
  </metadata>
  <text xml:id="untitled.text">
    <p xml:id="untitled.p.1">
      <s xml:id="untitled.p.1.s.1">
        <t>De belastingdienst doet aangifte tegen frauderende mensen.</t>
        <w xml:id="untitled.p.1.s.1.w.1" class="WORD">
          <t>De</t>
          <pos class="LID(bep,stan,rest)" confidence="0.999701" head="LID" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn" processor="p1.1">
            <feat class="bep" subset="lwtype"/>
            <feat class="stan" subset="naamval"/>
            <feat class="rest" subset="npagr"/>
          </pos>
          <lemma class="de"/>
        </w>
        <w xml:id="untitled.p.1.s.1.w.2" class="WORD">
          <t>belastingdienst</t>
          <pos class="N(soort,ev,basis,zijd,stan)" confidence="0.998836" head="N" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn" processor="p2.1">
            <feat class="soort" subset="ntype"/>
            <feat class="ev" subset="getal"/>
            <feat class="basis" subset="graad"/>
            <feat class="zijd" subset="genus"/>
            <feat class="stan" subset="naamval"/>
          </pos>
          <lemma class="belastingdienst"/>
        </w>
        <w xml:id="untitled.p.1.s.1.w.3" class="WORD">
          <t>doet</t>
          <pos class="WW(pv,tgw,met-t)" confidence="0.999262" head="WW" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn" processor="p1.1">
            <feat class="pv" subset="wvorm"/>
            <feat class="tgw" subset="pvtijd"/>
            <feat class="met-t" subset="pvagr"/>
          </pos>
          <lemma class="doen"/>
        </w>
        <w xml:id="untitled.p.1.s.1.w.4" class="WORD">
          <t>aangifte</t>
          <pos class="N(soort,ev,basis,zijd,stan)" confidence="0.998701" head="N" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn" processor="p2.2">
            <feat class="soort" subset="ntype"/>
            <feat class="ev" subset="getal"/>
            <feat class="basis" subset="graad"/>
            <feat class="zijd" subset="genus"/>
            <feat class="stan" subset="naamval"/>
          </pos>
          <lemma class="aangifte"/>
        </w>
        <w xml:id="untitled.p.1.s.1.w.5" class="WORD">
          <t>tegen</t>
          <pos class="VZ(init)" confidence="0.854093" head="VZ" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn" processor="p1.1">
            <feat class="init" subset="vztype"/>
          </pos>
          <lemma class="tegen"/>
        </w>
        <w xml:id="untitled.p.1.s.1.w.6" class="WORD">
          <t>frauderende</t>
          <pos class="WW(od,prenom,met-e)" confidence="0.96" head="WW" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn" processor="p1.1">
            <feat class="od" subset="wvorm"/>
            <feat class="prenom" subset="positie"/>
            <feat class="met-e" subset="buiging"/>
          </pos>
          <lemma class="frauderen"/>
        </w>
        <w xml:id="untitled.p.1.s.1.w.7" class="WORD" space="no">
          <t>mensen</t>
          <pos class="N(soort,mv,basis)" confidence="0.999865" head="N" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn" processor="p1.1">
            <feat class="soort" subset="ntype"/>
            <feat class="mv" subset="getal"/>
            <feat class="basis" subset="graad"/>
          </pos>
          <lemma class="mens"/>
        </w>
        <w xml:id="untitled.p.1.s.1.w.8" class="PUNCTUATION">
          <t>.</t>
          <pos class="LET()" confidence="1" head="LET" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-cgn" processor="p1.1"/>
          <lemma class="."/>
        </w>
      </s>
    </p>
  </text>
</FoLiA>

And another more real-life example:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="folia.xsl"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="example.deep" generator="libfolia-v1.5" version="2.0.0">
  <metadata type="native">
    <annotations>
      <text-annotation>
			 <annotator processor="p1" />
      </text-annotation>
      <sentence-annotation>
			 <annotator processor="p1" />
      </sentence-annotation>
      <token-annotation set="https://raw.githubusercontent.com/LanguageMachines/uctodata/folia1.4/setdefinitions/tokconfig-nld.foliaset.ttl">
			 <annotator processor="p2" />
      </token-annotation>
      <pos-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/frog-mbpos-cgn">
			 <annotator processor="p3.1" />
      </pos-annotation>
      <lemma-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/frog-mblem-nl">
			 <annotator processor="p3.2" />
      </lemma-annotation>
    </annotations>
    <provenance>
       <processor xml:id="p1" name="proycon" type="manual" />
       <processor xml:id="p2" name="ucto" version="0.14" />
       <processor xml:id="p3" name="frog" version="0.16" begindatetime="2016-11-15T15:12:00">
           <processor xml:id="p3.0" name="libfolia" version="1.14" type="generator" />
           <processor xml:id="p3.1" name="mbpos" version="1.0" />
           <processor xml:id="p3.2" name="mblem" version="1.1" />
       </processor>
    </provenance>
    <meta id="language">nld</meta>
  </metadata>
  <text xml:id="example.deep.text">
      <s xml:id="example.deep.p.1.s.1">
        <t>De Russen kennen Nova Zembla sinds de 11e of 12e eeuw, toen handelaars van Novgorod het eiland al aandeden.</t>
        <w xml:id="example.deep.p.1.s.1.w.1" class="WORD">
          <t>De</t>
          <pos class="LID(bep,stan,rest)" confidence="0.779762" head="LID">
            <feat class="bep" subset="lwtype"/>
            <feat class="stan" subset="naamval"/>
            <feat class="rest" subset="npagr"/>
          </pos>
          <lemma class="de"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.2" class="WORD">
          <t>Russen</t>
          <pos class="SPEC(deeleigen)" confidence="1" head="SPEC">
            <feat class="deeleigen" subset="spectype"/>
          </pos>
          <lemma class="Russen"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.3" class="WORD">
          <t>kennen</t>
          <pos class="WW(pv,tgw,mv)" confidence="0.833333" head="WW">
            <feat class="pv" subset="wvorm"/>
            <feat class="tgw" subset="pvtijd"/>
            <feat class="mv" subset="pvagr"/>
          </pos>
          <lemma class="kennen"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.4" class="WORD">
          <t>Nova</t>
          <pos class="SPEC(deeleigen)" confidence="1" head="SPEC">
            <feat class="deeleigen" subset="spectype"/>
          </pos>
          <lemma class="Nova"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.5" class="WORD">
          <t>Zembla</t>
          <pos class="SPEC(deeleigen)" confidence="1" head="SPEC">
            <feat class="deeleigen" subset="spectype"/>
          </pos>
          <lemma class="Zembla"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.6" class="WORD">
          <t>sinds</t>
          <pos class="VZ(init)" confidence="0.999078" head="VZ">
            <feat class="init" subset="vztype"/>
          </pos>
          <lemma class="sinds"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.7" class="WORD">
          <t>de</t>
          <pos class="LID(bep,stan,rest)" confidence="0.981886" head="LID">
            <feat class="bep" subset="lwtype"/>
            <feat class="stan" subset="naamval"/>
            <feat class="rest" subset="npagr"/>
          </pos>
          <lemma class="de"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.8" class="NUMBER-ORDINAL">
          <t>11e</t>
          <pos class="TW(rang,prenom,stan)" confidence="0.990632" head="TW">
            <feat class="rang" subset="numtype"/>
            <feat class="prenom" subset="positie"/>
            <feat class="stan" subset="naamval"/>
          </pos>
          <lemma class="11"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.9" class="WORD">
          <t>of</t>
          <pos class="VG(neven)" confidence="0.855677" head="VG">
            <feat class="neven" subset="conjtype"/>
          </pos>
          <lemma class="of"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.10" class="NUMBER-ORDINAL">
          <t>12e</t>
          <pos class="TW(rang,prenom,stan)" confidence="0.990632" head="TW">
            <feat class="rang" subset="numtype"/>
            <feat class="prenom" subset="positie"/>
            <feat class="stan" subset="naamval"/>
          </pos>
          <lemma class="12"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.11" class="WORD" space="no">
          <t>eeuw</t>
          <pos class="N(soort,ev,basis,zijd,stan)" confidence="0.999633" head="N">
            <feat class="soort" subset="ntype"/>
            <feat class="ev" subset="getal"/>
            <feat class="basis" subset="graad"/>
            <feat class="zijd" subset="genus"/>
            <feat class="stan" subset="naamval"/>
          </pos>
          <lemma class="eeuw"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.12" class="PUNCTUATION">
          <t>,</t>
          <pos class="LET()" confidence="1" head="LET"/>
          <lemma class=","/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.13" class="WORD">
          <t>toen</t>
          <pos class="VG(onder)" confidence="0.571429" head="VG">
            <feat class="onder" subset="conjtype"/>
          </pos>
          <lemma class="toen"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.14" class="WORD">
          <t>handelaars</t>
          <pos class="N(soort,mv,basis)" confidence="0.99944" head="N">
            <feat class="soort" subset="ntype"/>
            <feat class="mv" subset="getal"/>
            <feat class="basis" subset="graad"/>
          </pos>
          <lemma class="handelaar"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.15" class="WORD">
          <t>van</t>
          <pos class="VZ(init)" confidence="0.999469" head="VZ">
            <feat class="init" subset="vztype"/>
          </pos>
          <lemma class="van"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.16" class="WORD">
          <t>Novgorod</t>
          <pos class="SPEC(deeleigen)" confidence="1" head="SPEC">
            <feat class="deeleigen" subset="spectype"/>
          </pos>
          <lemma class="Novgorod"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.17" class="WORD">
          <t>het</t>
          <pos class="LID(bep,stan,evon)" confidence="0.996855" head="LID">
            <feat class="bep" subset="lwtype"/>
            <feat class="stan" subset="naamval"/>
            <feat class="evon" subset="npagr"/>
          </pos>
          <lemma class="het"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.18" class="WORD">
          <t>eiland</t>
          <pos class="N(soort,ev,basis,onz,stan)" confidence="0.996804" head="N">
            <feat class="soort" subset="ntype"/>
            <feat class="ev" subset="getal"/>
            <feat class="basis" subset="graad"/>
            <feat class="onz" subset="genus"/>
            <feat class="stan" subset="naamval"/>
          </pos>
          <lemma class="eiland"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.19" class="WORD">
          <t>al</t>
          <pos class="BW()" confidence="0.90383" head="BW"/>
          <lemma class="al"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.20" class="WORD" space="no">
          <t>aandeden</t>
          <pos class="WW(pv,verl,mv)" confidence="0.999559" head="WW">
            <feat class="pv" subset="wvorm"/>
            <feat class="verl" subset="pvtijd"/>
            <feat class="mv" subset="pvagr"/>
          </pos>
          <lemma class="aandoen"/>
        </w>
        <w xml:id="example.deep.p.1.s.1.w.21" class="PUNCTUATION">
          <t>.</t>
          <pos class="LET()" confidence="1" head="LET"/>
          <lemma class="."/>
        </w>
      </s>
  </text>
</FoLiA>

Another example with many annotation types and extensive provenance data: