Linebreak

Structure annotation representing a single linebreak and with special facilities to denote pagebreaks.

Specification

Annotation Category:
 

Structure Annotation

Declaration:

<linebreak-annotation set="..."> (note: ``set`` is optional for this annotation type)

Version History:
 

Since the beginning

Element:

<br>

API Class:

Linebreak

Required Attributes:
 
Optional Attributes:
 
  • xml:id – The ID of the element; this has to be a unique in the entire document or collection of documents (corpus). All identifiers in FoLiA are of the XML NCName datatype, which roughly means it is a unique string that has to start with a letter (not a number or symbol), may contain numers, but may never contain colons or spaces. FoLiA does not define any naming convention for IDs.
  • set – The set of the element, ideally a URI linking to a set definition (see Set Definitions (Vocabulary)) or otherwise a uniquely identifying string. The set must be referred to also in the Annotation Declarations for this annotation type.
  • class – The class of the annotation, i.e. the annotation tag in the vocabulary defined by set.
  • processor – This refers to the ID of a processor in the Provenance Data. The processor in turn defines exactly who or what was the annotator of the annotation.
  • annotator – This is an older alternative to the processor attribute, without support for full provenance. The annotator attribute simply refers to the name o ID of the system or human annotator that made the annotation.
  • annotatortype – This is an older alternative to the processor attribute, without support for full provenance. It is used together with annotator and specific the type of the annotator, either manual for human annotators or auto for automated systems.
  • confidence – A floating point value between zero and one; expresses the confidence the annotator places in his annotation.
  • datetime – The date and time when this annotation was recorded, the format is YYYY-MM-DDThh:mm:ss (note the literal T in the middle to separate date from time), as per the XSD Datetime data type.
  • n – A number in a sequence, corresponding to a number in the original document, for example chapter numbers, section numbers, list item numbers. This this not have to be an actual number but other sequence identifiers are also possible (think alphanumeric characters or roman numerals).
  • space – This attribute indicates whether spacing should be inserted after this element (it’s default value is always yes, so it does not need to be specified in that case), but if tokens or other structural elements are glued together then the value should be set to no. This allows for reconstruction of the detokenised original text.
  • src – Points to a file or full URL of a sound or video file. This attribute is inheritable.
  • begintime – A timestamp in HH:MM:SS.MMM format, indicating the begin time of the speech. If a sound clip is specified (src); the timestamp refers to a location in the soundclip.
  • endtime – A timestamp in HH:MM:SS.MMM format, indicating the end time of the speech. If a sound clip is specified (src); the timestamp refers to a location in the soundclip.
  • speaker – A string identifying the speaker. This attribute is inheritable. Multiple speakers are not allowed, simply do not specify a speaker on a certain level if you are unable to link the speech to a specific (single) speaker.
  • xlink:href – Turns this element into a hyperlink to the specified URL
  • xlink:type – The type of link (you’ll want to use simple in almost all cases).
Accepted Data:

<alt> (Alternative Annotation), <altlayers> (Alternative Annotation), <comment> (Comment Annotation), <correction> (Correction Annotation), <desc> (Description Annotation), <metric> (Metric Annotation), <part> (Part Annotation), <relation> (Relation Annotation)

Valid Context:

<def> (Definition Annotation), <div> (Division Annotation), <event> (Event Annotation), <ex> (Example Annotation), <head> (Head Annotation), <note> (Note Annotation), <p> (Paragraph Annotation), <ref> (Reference Annotation), <s> (Sentence Annotation), <term> (Term Annotation), <t> (Text Annotation), <t-correction> (Correction Annotation), <t-error> (Error Detection Annotation (DEPRECATED)), <t-gap> (Gap Annotation), <t-str> (String Annotation), <t-style> (Style Annotation)

Extra Attributes:
 
  • newpage – Can be set to yes to indicate that the break is not just a linebreak, but also a pagebreak (defaults to no)
  • pagenr – The number of the page after the break
  • linenr – The number of the line after the break

Description & Examples

Linebreaks play a double role, they are a structure element as well as a text markup element, the latter implies that you may also use <br/> within the scope of text content, so within a <t> element.

The difference between br and whitespace is that the former specifies that only a linebreak was present, not forcing any vertical whitespace between the lines, whilst the latter actually generates an empty space, which would comparable to two successive br statements. Both elements can be used inside various structural elements, such as divisions, paragraphs, headers, and sentences. Note that the example below also contains an example of Hyphenation, which is a special softer kind of linebreak.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
<?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" version="2.0" xml:id="example">
  <metadata>
      <annotations>
          <text-annotation>
			 <annotator processor="p1" />
          </text-annotation>
          <division-annotation set="https://raw.githubusercontent.com/LanguageMachines/uctodata/master/setdefinitions/divisions.foliaset.xml">
			 <annotator processor="p1" />
		  </division-annotation>
          <whitespace-annotation>
			 <annotator processor="p1" />
		  </whitespace-annotation>
          <linebreak-annotation>
			 <annotator processor="p1" />
          </linebreak-annotation>
          <hyphenation-annotation>
			 <annotator processor="p1" />
         </hyphenation-annotation>
      </annotations>
      <provenance>
         <processor xml:id="p1" name="proycon" type="manual" />
      </provenance>
  </metadata>
  <text xml:id="example.text">
     <div xml:id="example.div.1" class="section" n="1">
         <t>Blah...</t>
     </div>
     <whitespace />
     <br newpage="yes" pagenr="2" />
     <div xml:id="example.div.2" class="section" n="2">
         <!-- BR has a double role, it can be used a text markup element as well, as seen on the next line -->
         <t>To be, <br />or not to be!</t>
     </div>
     <div xml:id="example.div.3" class="section" n="3">
         <t>Don't leave me bro<t-hbr/>ken and alone!</t>
     </div>
  </text>
</FoLiA>

You can use <br/> also in the context of Text Markup Annotation, as in the following example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<?xml version="1.0" encoding="utf-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" version="2.0" xml:id="example">
  <metadata>
      <annotations>
          <text-annotation>
			 <annotator processor="p1" />
          </text-annotation>
          <sentence-annotation>
			 <annotator processor="p1" />
		  </sentence-annotation>
          <linebreak-annotation>
			 <annotator processor="p1" />
		  </linebreak-annotation>
          <part-annotation>
			 <annotator processor="p1" />
		  </part-annotation>
          <style-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/styles.foliaset.xml">
			 <annotator processor="p1" />
		  </style-annotation>
      </annotations>
      <provenance>
         <processor xml:id="p1" name="proycon" type="manual" />
      </provenance>
  </metadata>
  <text xml:id="example.text">
        <s>
            <t>To <t-style class="italic">be</t-style> or not to be,<br/>that is the <t-style class="bold"><t-style class="red">question</t-style></t-style>.</t>
        </s>
  </text>
</FoLiA>