Hidden Token Annotation¶
This annotation type introduces a tokenisation layer for the document. The terms token and word are used interchangeably in FoLiA as FoLiA itself does not commit to a specific tokenisation paradigm. Tokenisation is a prerequisite for the majority of linguistic annotation types offered by FoLiA and it is one of the most fundamental types of Structure Annotation. The words/tokens are typically embedded in other types of structure elements, such as sentences or paragraphs.
Specification¶
Annotation Category: | |
---|---|
Declaration: |
|
Version History: | |
Since the beginning |
|
Element: |
|
API Class: |
|
Required Attributes: | |
Optional Attributes: | |
|
|
Accepted Data: |
|
Valid Context: |
|
Explanation¶
Hidden tokens are tokens that are explicitly not part of the original text. They are either implied or tokens that act
as a dummy for further linguistic annotation. Hidden tokens are are valid target for any form of span annotation through
the <wref>
element (see :ref:`span_annotation_category`_). They are structure elements so may appear interleaved
with the normal tokenisation layer, for which the order is significant.
Example¶
The following example shows syntactic movement annotation which makes use of a hidden token for an implicit subject.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | <?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" version="2.0" xml:id="example">
<metadata>
<annotations>
<token-annotation set="https://raw.githubusercontent.com/LanguageMachines/uctodata/master/setdefinitions/tokconfig-eng.foliaset.ttl">
<annotator processor="p1" />
</token-annotation>
<hiddentoken-annotation>
<annotator processor="p1" />
</hiddentoken-annotation>
<text-annotation>
<annotator processor="p1" />
</text-annotation>
<sentence-annotation>
<annotator processor="p1" />
</sentence-annotation>
<pos-annotation set="adhoc">
<annotator processor="p1" />
</pos-annotation>
<syntax-annotation set="adhoc">
<annotator processor="p1" />
</syntax-annotation>
<description-annotation>
<annotator processor="p1" />
</description-annotation>
</annotations>
<provenance>
<processor xml:id="p1" name="proycon" type="manual" />
</provenance>
</metadata>
<text xml:id="example.text">
<s xml:id="example.s.1">
<hiddenw xml:id="example.s.1.w.0">
<t>*exp*</t>
<desc>empty expletive subject</desc>
<pos class="EX" />
</hiddenw>
<w xml:id="example.s.1.w.1" space="no">
<t>Is</t>
<pos class="BEP" />
</w>
<w xml:id="example.s.1.w.2">
<t>n't</t>
<pos class="NEG" />
</w>
<w xml:id="example.s.1.w.3">
<t>a</t>
<pos class="D" />
</w>
<w xml:id="example.s.1.w.4">
<t>whole</t>
<pos class="ADJ" />
</w>
<w xml:id="example.s.1.w.5">
<t>lot</t>
<pos class="N" />
</w>
<w xml:id="example.s.1.w.6" space="no">
<t>left</t>
<pos class="VAN" />
</w>
<w xml:id="example.s.1.w.7">
<t>.</t>
<pos class="PUNC" />
</w>
<syntax>
<su xml:id="example.s.1.su.1" class="IP-MAT">
<su xml:id="example.s.1.su.2" class="NP-SBJ">
<wref id="example.s.1.w.0" />
</su>
<su xml:id="example.s.1.su.3" class="VP">
<su xml:id="example.s.1.su.4" class="BEP">
<wref id="example.s.1.w.1" />
</su>
<su xml:id="example.s.1.su.5" class="NEG">
<wref id="example.s.1.w.2" />
</su>
<su xml:id="example.s.1.su.6" class="VP">
<su xml:id="example.s.1.su.7" class="NP-LGS">
<wref id="example.s.1.w.3" />
<su xml:id="example.s.1.su.8" class="ADJP">
<wref id="example.s.1.w.4" />
</su>
<wref id="example.s.1.w.5" />
</su>
<wref id="example.s.1.w.6" />
</su>
</su>
<su class="PUNC">
<wref id="example.s.1.w.7" />
</su>
</su>
</syntax>
</s>
</text>
</FoLiA>
|