Gap Annotation¶
Sometimes there are parts of a document you want to skip and not annotate at all, but include as is. This is where gap annotation comes in, the user-defined set may indicate the kind of gap. Common omissions in books are for example front-matter and back-matter, i.e. the cover.
Specification¶
Structure Element¶
Annotation Category: | |
---|---|
Declaration: |
|
Version History: | |
Since the beginning |
|
Element: |
|
API Class: |
|
Required Attributes: | |
Optional Attributes: | |
|
|
Accepted Data: |
|
Valid Context: |
|
Text markup Element¶
Element: |
|
---|---|
API Class: |
|
Required Attributes: | |
Optional Attributes: | |
|
|
Accepted Data: |
|
Valid Context: |
Explanation¶
Sometimes there are parts of a document you want to skip and not annotate, but include as is. For this purpose the
<gap>
element should be used. Gaps may have a particular class indicating the kind of gap it is, defined by a
user-defined set. Common omissions are for example front-matter and back-matter, text that is illegible/inaudible or in
a foreign language. Again, the semantics depend on your set.
Although a gap skips over content, you may still want to explicitly add the raw content, this is done with the <content>
element (see Raw Content). As this concerns raw content, it can not be annotated any
further and we use XML CDATA type here to include it verbatim.
The following example shows the the use of <gap>
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | <?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" version="2.0" xml:id="example">
<metadata>
<annotations>
<text-annotation>
<annotator processor="p1" />
</text-annotation>
<division-annotation set="https://raw.githubusercontent.com/LanguageMachines/uctodata/master/setdefinitions/divisions.foliaset.xml">
<annotator processor="p1" />
</division-annotation>
<gap-annotation set="adhoc">
<annotator processor="p1" />
</gap-annotation>
<rawcontent-annotation>
<annotator processor="p1" />
</rawcontent-annotation>
<description-annotation>
<annotator processor="p1" />
</description-annotation>
<paragraph-annotation>
<annotator processor="p1" />
</paragraph-annotation>
</annotations>
<provenance>
<processor xml:id="p1" name="proycon" type="manual" />
</provenance>
</metadata>
<text xml:id="example.text">
<gap class="frontmatter">
<desc>This is the cover of the book</desc>
<content>
<![CDATA[
SHOW WHITE AND THE SEVEN DWARFS
by the Brothers Grimm
first edition
Copyright(c) blah blah
]]>
</content>
</gap>
<div xml:id="example.div.1" class="chapter" n="1">
<t>In the <t-gap class="illegible" /> there was a princess...</t>
</div>
</text>
</FoLiA>
|
The gap element comes in two flavours, there is not just the aforementioned structural elements but there is also a text
markup element (see Text Markup Annotation). This is the text markup element <t-gap>
and it offers a
more fine-grained variant for use in untokenised text. It indicates a gap in the textual content and is also shown in
the above example. Either text is not available or there is a deliberate blank for, for example, fill-in exercises. It
is recommended to provide a textual value when possible, but this is not required.
If you find that you want to mark your whole
text content as being a <t-gap>
, then this is a sure sign you should use the
structural element <gap>
instead.
Note
Both elements are the same annotation type so share the same declaration.
Text Redundancy¶
In cases of text redundancy (see Text Annotation), the <t-gap>
element may take an
ID reference attribute that refers to a gap
element, as shown in the following
example:
<s>
<t>to <t-gap id="gap.1" class="fillin">be</t-gap> or not to be</t>
<w><t>to</t></w>
<gap xml:id="gap.1" class="fillin"><content>be</content></gap>
<w><t>or</t></w>
<w><t>not</t></w>
<w><t>to</t></w>
<w><t>be</t></w>
</s>