Entry Annotation¶
FoLiA has a set of structure elements that can be used to represent collections such as glossaries, dictionaries, thesauri, and wordnets. Entry annotation defines the entries in such collections, Term annotation defines the terms, and Definition Annotation provides the definitions.
In this documentation we cover all four annotation types, as they are intimately connected.
Specification¶
Annotation Category: | |
---|---|
Declaration: |
|
Version History: | |
since v0.12 |
|
Element: |
|
API Class: |
|
Required Attributes: | |
Optional Attributes: | |
|
|
Accepted Data: |
|
Valid Context: |
|
Annotation Category: | |
---|---|
Declaration: |
|
Version History: | |
since v0.12 |
|
Element: |
|
API Class: |
|
Required Attributes: | |
Optional Attributes: | |
|
|
Accepted Data: |
|
Valid Context: |
|
Annotation Category: | |
---|---|
Declaration: |
|
Version History: | |
since v0.12 |
|
Element: |
|
API Class: |
|
Required Attributes: | |
Optional Attributes: | |
|
|
Accepted Data: |
|
Valid Context: |
|
Annotation Category: | |
---|---|
Declaration: |
|
Version History: | |
since v0.12 |
|
Element: |
|
API Class: |
|
Required Attributes: | |
Optional Attributes: | |
|
|
Accepted Data: |
|
Valid Context: |
|
Explanation¶
Collections such as glossaries, dictionaries, thesauri and wordnets have in common that they consist of a set of
entries, which is represented in FoLiA by the <entry>
element, and each entry is identified by one or more terms,
represented by the <term>
element within an entry.
Terms need not be words, but a wide variety of structural elements can be used
as the term. Within the entry, these terms can subsequently be associated with
one or more definitions, using the <def>
element, or with examples,
using the <ex>
element.
The <term>
, <def>
and <ex>
elements can all take sets and
classes, and thus need to be declared. The <entry>
elements themselves
are simple containers and can contain multiple
terms if they are deemed dependent or related, such as in case of morphological
variants such as verb conjugations and declensions. The elements <term>
and <def>
can only be used within an <entry>
. The <ex>
element, however, can also be used standalone in different contexts.
In FoLiA, linguistic annotations are associated with the structure element within the term itself. This is where a glossary can for instance obtain part-of-speech or lexical semantic sense information, to name just a few examples.
Below you see an example of a glossary entry, the sense set used comes from WordNet. The other sets are fictitious.
<entry xml:id="entry.1">
<term xml:id="entry.1.term.1">
<w xml:id="entry.1.term.1.w.1">
<t>house</t>
<pos class="n">
<feat subset="number" class="sing" />
</pos>
<lemma class="house" />
<sense class="house\%1:06:00::">
</w>
</term>
<term xml:id="entry.1.term.2">
<w xml:id="entry.1.term.2.w.1">
<t>houses/t>
<pos class="n">
<feat subset="number" class="plural" />
</pos>
<lemma class="house" />
<sense class="house\%1:06:00::">
</w>
</term>
<def xml:id="entry.1.def.1" class="sensedescription">
<p xml:id="entry.1.def.1.p.1">
<t>A dwelling, place of residence</t>
</p>
</def>
<ex>
<s xml:id="entry.1.ex.1.s.1>
<t>My house was constructed ten years ago.</t>
</s>
</ex>
</entry>
Other semantic senses would be represented as separate entries.
The definitions (<def>
) are a generic element that can be used for
multiple types of definition. As always, the set is not predefined and purely
fictitious in our examples, giving the user flexibility. Definitions are for
instance suited for dictionaries:
<entry xml:id="entry.1">
<term xml:id="entry.1.term.1">
<w xml:id="entry.1.term.1.w.1">
<t>house</t>
<pos set="englishpos" class="n">
<feat subset="number" class="sing" />
</pos>
<lemma set="englishlemma" class="house" />
<sense set="englishsense" class="house\%1:06:00::">
</w>
</term>
<def xml:id="entry.1.def.1" class="translation-es">
<w xml:id="entry.1.def.1.w.1">
<t>casa</t>
<pos set="spanishpos" class="n">
<feat subset="number" class="sing" />
</pos>
<lemma set="spanishlemma" class="casa" />
</w>
</def>
</entry>
Or for etymological definitions:
<def xml:id="entry.1.def.2" class="etymology">
<p xml:id="entry.1.def.2.p.1">
<t>Old English hus "dwelling, shelter, house," from Proto-Germanic *husan
(cognates: Old Norse, Old Frisian hus, Dutch huis, German Haus), of unknown
origin, perhaps connected to the root of hide (v.) [OED]. In Gothic only in
gudhus "temple," literally "god-house;" the usual word for "house" in Gothic
being razn. </t>
</p>
</def>
The following two samples illustrate a dictionary distributed over multiple FoLiA files, using Relation Annotation to link the two:
English part, doc-english.xml
(excerpt):
<entry xml:id="en-entry.1">
<term xml:id="en-entry.1.term.1">
<w xml:id="en-entry.1.term.1.w.1">
<t>house</t>
<pos set="englishpos" class="n">
<feat subset="number" class="sing" />
</pos>
<lemma set="englishlemma" class="house" />
<sense set="englishsense" class="house\%1:06:00::">
</w>
</term>
<relation class="translation-es" xlink:href="doc-spanish.xml"
xlink:type="simple">
<xref id="es-entry.1" type="entry" />
</relation>
</entry>
Spanish part, doc-spanish.xml
(excerpt):
<entry xml:id="es-entry.1">
<term xml:id="es-entry.1.def.1" class="translation-es">
<w xml:id="entry.1.def.1.w.1">
<t>casa</t>
<pos set="spanishpos" class="n">
<feat subset="number" class="sing" />
</pos>
<lemma set="spanishlemma" class="casa" />
</w>
</term>
<relation class="translation-en" xlink:href="doc-english.xml"
xlink:type="simple">
<xref id="en-entry.1" type="entry" />
</relation>
</entry>
For simple multilingual documents, explicit relations may be too much hassle, For situations where this seems overkill, a simple multi-document mechanism is available. This mechanism is based purely on convention: It assumes that structural elements that are translations simply share the same ID. This approach is quite feasible when used on higher-level structural elements, such as divisions, paragraphs, events or entries.