Entity Annotation¶
Entity annotation is a broad and common category in FoLiA. It is used for specifying all kinds of multi-word expressions, including but not limited to named entities. The set definition used determines the vocabulary and therefore the precise nature of the entity annotation.
Specification¶
Annotation Category: | |
---|---|
Declaration: |
|
Version History: | |
Since the beginning |
|
Element: |
|
API Class: |
|
Layer Element: |
|
Span Role Elements: | |
Required Attributes: | |
Optional Attributes: | |
|
|
Accepted Data: |
|
Valid Context: |
|
Explanation¶
Note
Please first ensure you are familiar with the general principles of Span Annotation to make sense of this annotation type.
The entities
layer offers a generic solution to encode various types
of entities or multi-word expressions, including but not limited to named
entities. The set used determines the precise semantics behind the entities.
This annotation type, being the simplest of all span annotations, is much used in FoLiA.
It is recommended, but not required, for each entity to have a unique identifier.
Examples¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | <?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" version="2.0" xml:id="example">
<metadata>
<annotations>
<token-annotation set="https://raw.githubusercontent.com/LanguageMachines/uctodata/master/setdefinitions/tokconfig-eng.foliaset.ttl" format="text/turtle">
<annotator processor="p1" />
</token-annotation>
<text-annotation>
<annotator processor="p1" />
</text-annotation>
<sentence-annotation>
<annotator processor="p1" />
</sentence-annotation>
<paragraph-annotation>
<annotator processor="p1" />
</paragraph-annotation>
<entity-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/namedentities.foliaset.ttl" format="text/turtle">
<annotator processor="p1" />
</entity-annotation>
</annotations>
<provenance>
<processor xml:id="p1" name="proycon" type="manual" />
</provenance>
</metadata>
<text xml:id="example.text">
<p xml:id="example.p.1">
<s xml:id="example.p.1.s.1">
<t>The Dalai Lama currently lives in Dharamsala in India.</t>
<w xml:id="example.p.1.s.1.w.1" class="WORD">
<t>The</t>
</w>
<w xml:id="example.p.1.s.1.w.2" class="WORD">
<t>Dalai</t>
</w>
<w xml:id="example.p.1.s.1.w.3" class="WORD">
<t>Lama</t>
</w>
<w xml:id="example.p.1.s.1.w.4" class="WORD">
<t>currently</t>
</w>
<w xml:id="example.p.1.s.1.w.5" class="WORD">
<t>lives</t>
</w>
<w xml:id="example.p.1.s.1.w.6" class="WORD">
<t>in</t>
</w>
<w xml:id="example.p.1.s.1.w.7" class="WORD">
<t>Dharamsala</t>
</w>
<w xml:id="example.p.1.s.1.w.8" class="WORD">
<t>in</t>
</w>
<w xml:id="example.p.1.s.1.w.9" class="WORD" space="no">
<t>India</t>
</w>
<w xml:id="example.p.1.s.1.w.10" class="PUNCTUATION">
<t>.</t>
</w>
<entities>
<entity xml:id="example.p.1.s.1.entity.1" class="per">
<wref id="example.p.1.s.1.w.2" t="Dalai" />
<wref id="example.p.1.s.1.w.3" t="Lama" />
</entity>
<entity xml:id="example.p.1.s.1.entity.2" class="loc.city">
<wref id="example.p.1.s.1.w.7" t="Dharamsala" />
</entity>
<entity xml:id="example.p.1.s.1.entity.3" class="loc.country">
<wref id="example.p.1.s.1.w.9" t="India" />
</entity>
</entities>
</s>
</p>
</text>
</FoLiA>
|
It is possible to associate inline annotations with span annotations, provided you declare the annotation type with
groupannotations="yes"
. For entities, this is useful in case you have a more fine-grained tokenisation layer but
want to associate certain information such as part-of-speech tags or lemmas with larger entities than tokens:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | <?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" version="2.0" xml:id="example">
<metadata>
<annotations>
<token-annotation set="https://raw.githubusercontent.com/LanguageMachines/uctodata/master/setdefinitions/tokconfig-eng.foliaset.ttl">
<annotator processor="p1" />
</token-annotation>
<text-annotation>
<annotator processor="p1" />
</text-annotation>
<sentence-annotation>
<annotator processor="p1" />
</sentence-annotation>
<paragraph-annotation>
<annotator processor="p1" />
</paragraph-annotation>
<entity-annotation groupannotations="yes">
<annotator processor="p1" />
</entity-annotation>
<pos-annotation set="brown"> <!-- This is an ad-hoc set declaration as it is no URL and therefore not really defined -->
<annotator processor="p1" />
</pos-annotation>
<lemma-annotation set="english-adhoc"> <!-- This is an ad-hoc set declaration as it is no URL and therefore not really defined -->
<annotator processor="p1" />
</lemma-annotation>
</annotations>
<provenance>
<processor xml:id="p1" name="proycon" type="manual" />
</provenance>
</metadata>
<text xml:id="example.text">
<p xml:id="example.p.1">
<s xml:id="example.p.1.s.1">
<t>The container-ship lost its cargo of bottle openers.</t>
<w xml:id="example.p.1.s.1.w.1" class="WORD">
<t>The</t>
<pos class="AT" />
</w>
<w xml:id="example.p.1.s.1.w.2" class="WORD" space="no">
<t>container</t>
</w>
<w xml:id="example.p.1.s.1.w.3" class="WORD" space="no">
<t>-</t>
</w>
<w xml:id="example.p.1.s.1.w.4" class="WORD">
<t>ship</t>
</w>
<w xml:id="example.p.1.s.1.w.5" class="WORD">
<t>lost</t>
<pos class="VBD" />
</w>
<w xml:id="example.p.1.s.1.w.6" class="WORD">
<t>its</t>
<pos class="PP$" />
</w>
<w xml:id="example.p.1.s.1.w.7" class="WORD">
<t>cargo</t>
<pos class="NN" />
</w>
<w xml:id="example.p.1.s.1.w.8" class="WORD">
<t>of</t>
<pos class="IN" />
</w>
<w xml:id="example.p.1.s.1.w.9" class="WORD">
<t>bottle</t>
</w>
<w xml:id="example.p.1.s.1.w.10" class="WORD" space="no">
<t>openers</t>
</w>
<w xml:id="example.p.1.s.1.w.11" class="PUNCTUATION">
<t>.</t>
</w>
<entities>
<entity xml:id="example.p.1.s.1.entity.1">
<wref id="example.p.1.s.1.w.2" t="container" />
<wref id="example.p.1.s.1.w.3" t="-" />
<wref id="example.p.1.s.1.w.4" t="ship" />
<pos class="NN" />
<lemma class="container-ship" />
</entity>
<entity xml:id="example.p.1.s.1.entity.2">
<wref id="example.p.1.s.1.w.9" t="bottle" />
<wref id="example.p.1.s.1.w.10" t="openers" />
<pos class="NNS" />
<lemma class="bottle opener" />
</entity>
</entities>
</s>
</p>
</text>
</FoLiA>
|