Structure Annotation¶

This category encompasses annotation types that define the structure of a document, e.g. paragraphs, sentences, words, sections like chapters, lists, tables, etc… These types are not strictly considered linguistic annotation and equivalents are also commonly found in other document formats such as HTML, TEI, MarkDown, LaTeX, and others. For FoLiA it provides the necessary structural basis that linguistic annotation can build on.

FoLiA defines the following types of structure annotation:

Structure Annotation – This category encompasses annotation types that define the structure of a document, e.g. paragraphs, sentences, words, sections like chapters, lists, tables, etc… These types are not strictly considered linguistic annotation and equivalents are also commonly found in other document formats such as HTML, TEI, MarkDown, LaTeX, and others. For FoLiA it provides the necessary structural basis that linguistic annotation can build on.
- Token Annotation – <w> – This annotation type introduces a tokenisation layer for the document. The terms token and word are used interchangeably in FoLiA as FoLiA itself does not commit to a specific tokenisation paradigm. Tokenisation is a prerequisite for the majority of linguistic annotation types offered by FoLiA and it is one of the most fundamental types of Structure Annotation. The words/tokens are typically embedded in other types of structure elements, such as sentences or paragraphs.
- Division Annotation – <div> – Structure annotation representing some kind of division, typically used for chapters, sections, subsections (up to the set definition). Divisions may be nested at will, and may include almost all kinds of other structure elements.
- Paragraph Annotation – <p> – Represents a paragraph and holds further structure annotation such as sentences.
- Head Annotation – <head> – The head element is used to provide a header or title for the structure element in which it is embedded, usually a division (<div>)
- List Annotation – <list> – Structure annotation for enumeration/itemisation, e.g. bulleted lists.
- Figure Annotation – <figure> – Structure annotation for including pictures, optionally captioned, in documents.
- Vertical Whitespace – <whitespace> – Structure annotation introducing vertical whitespace
- Linebreak – <br> – Structure annotation representing a single linebreak and with special facilities to denote pagebreaks.
- Sentence Annotation – <s> – Structure annotation representing a sentence. Sentence detection is a common stage in NLP alongside tokenisation.
- Event Annotation – <event> – Structural annotation type representing events, often used in new media contexts for things such as tweets, chat messages and forum posts (as defined by a user-defined set definition). Note that a more linguistic kind of event annotation can be accomplished with Entity Annotation or even Time Segmentation rather than this one.
- Quote Annotation – <quote> – Structural annotation used to explicitly mark quoted speech, i.e. that what is reported to be said and appears in the text in some form of quotation marks.
- Note Annotation – <note> – Structural annotation used for notes, such as footnotes or warnings or notice blocks.
- Reference Annotation – <ref> – Structural annotation for referring to other annotation types. Used e.g. for referring to bibliography entries (citations) and footnotes.
- Table Annotation – <table> – Structural annotation type for creating a simple tabular environment, i.e. a table with rows, columns and cells and an optional header.
- Part Annotation – <part> – The structure element part is a fairly abstract structure element that should only be used when a more specific structure element is not available. Most notably, the part element should never be used for representation of morphemes or phonemes! Part can be used to divide a larger structure element, such as a division, or a paragraph into arbitrary subparts.
- Utterance Annotation – <utt> – An utterance is a structure element that may consist of words or sentences, which in turn may contain words. The opposite is also true, a sentence may consist of multiple utterances. Utterances are often used in the absence of sentences in a speech context, where neat grammatical sentences can not always be distinguished.
- Entry Annotation – <entry> – FoLiA has a set of structure elements that can be used to represent collections such as glossaries, dictionaries, thesauri, and wordnets. Entry annotation defines the entries in such collections, Term annotation defines the terms, and Definition Annotation provides the definitions.
- Term Annotation – <term> – FoLiA has a set of structure elements that can be used to represent collections such as glossaries, dictionaries, thesauri, and wordnets. Entry annotation defines the entries in such collections, Term annotation defines the terms, and Definition Annotation provides the definitions.
- Definition Annotation – <def> – FoLiA has a set of structure elements that can be used to represent collections such as glossaries, dictionaries, thesauri, and wordnets. Entry annotation defines the entries in such collections, Term annotation defines the terms, and Definition Annotation provides the definitions.
- Example Annotation – <ex> – FoLiA has a set of structure elements that can be used to represent collections such as glossaries, dictionaries, thesauri, and wordnets. Examples annotation defines examples in such collections.
- Hidden Token Annotation – <hiddenw> – This annotation type allows for a hidden token layer in the document. Hidden tokens are ignored for most intents and purposes but may serve a purpose when annotations on implicit tokens is required, for example as targets for syntactic movement annotation.