Language Annotation¶
Language Annotation simply identifies the language a part of the text is in. Though this information is often part of the metadata, this form is considered an actual annotation.
Specification¶
Annotation Category: | |
---|---|
Declaration: |
|
Version History: | |
since v0.8.1 |
|
Element: |
|
API Class: |
|
Required Attributes: | |
|
|
Optional Attributes: | |
|
|
Accepted Data: |
|
Valid Context: |
Text markup Element¶
Element: |
|
---|---|
API Class: |
|
Required Attributes: | |
Optional Attributes: | |
|
|
Accepted Data: |
|
Valid Context: |
Explanation¶
Language identification is used to identify a certain structural element as being in a certain language, so it can be applied to the text as a whole or smaller elements within it. The language vocabulary is determined by the set definition.
The text markup variant (<t-lang>
), can be used in non-tokenised contexts.
Example¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | <?xml version="1.0" encoding="utf-8"?> <FoLiA xmlns="http://ilk.uvt.nl/folia" version="2.0" xml:id="example"> <metadata> <annotations> <text-annotation> <annotator processor="p1" /> </text-annotation> <sentence-annotation> <annotator processor="p1" /> </sentence-annotation> <paragraph-annotation> <annotator processor="p1" /> </paragraph-annotation> <domain-annotation set="topics"> <!-- an ad-hoc set --> <annotator processor="p1" /> </domain-annotation> <lang-annotation set="iso638-3"> <!-- an ad-hoc set --> <annotator processor="p1" /> </lang-annotation> </annotations> <provenance> <processor xml:id="p1" name="proycon" type="manual" /> </provenance> </metadata> <text xml:id="example.text"> <p xml:id="example.p.1"> <s xml:id="example.p.1.s.1"> <t>I show an example:</t> <lang class="eng" /> </s> <s xml:id="example.p.1.s.2"> <t>У меня собака, её зовут Джайко.</t> <domain class="animals" /> <lang class="rus" /> </s> </p> </text> </FoLiA> |