Set Definitions (Vocabulary)


The sets and classes used by the various linguistic annotation types are never defined in the FoLiA documents themselves, but externally in set definitions.

By using set definitions, a FoLiA document can be validated on a deep level, i.e. the validity of the used classes can be tested. Set definitions provide semantics to the FoLiA documents that use them and are an integral part of FoLiA. When set definitions are absent, validation can only be conducted on a shallow level that is agnostic about all sets and the classes therein.

Recall that all sets that are used need to be declared in the Annotation Declarations section in the document header and that they point to URLs holding a FoLiA set definitions. If no set definition files are associated, then a full in-depth validation cannot take place.

The role of FoLiA Set Definitions is:

  • to define which classes are valid in a set
  • to define which subsets and classes are valid in Features in a set
  • to constrain which subsets and classes may co-occur in an annotation of the set
  • to allow enumeration over classes and subsets
  • to assign human-readable labels to symbolic classes
  • to relate classes to external resources defining them (data category registries, linked data)
  • to define a hierarchy/taxonomy of classes

Prior to FoLiA v1.4, set definitions were stored in a simple custom XML format, distinct from FoLiA itself, which we call the legacy format and which is still supported for backward compatibility. Since FoLiA v1.4 however, we strongly prefer and recommend to store the set definitions as RDF [RDF], i.e. the technology that powers the semantic web. In this way, set definitions provide a formal semantic layer for FoLiA.

Set definitions may be stored in various common RDF serialisation formats. The format can be indicated on the declarations in the document metadata using the format attribute, recognised values are:

  • application/rdf+xml – XML for RDF (assumed for rdf.xml or rdf extensions
  • text/turtleTurtle (for RDF) (assumed for ttl extensions)
  • text/n3 – Notation 3 (for RDF) (assumed for n3 extensions)
  • application/foliaset+xml - Legacy FoLiA Set Definition format (XML) (assumed for xml extensions and in most other cases)

FoLiA applications should attempt to autodetect the format based on the extension. Not all applications may be able to deal with all formats/serialisations, however.

In this documentation, we will use the Turtle format for RDF, alongside our older legacy format. In all cases, FoLiA requires that only one set is defined per file, any other defined sets must be subsets of the primary set. In our legacy XML format, an otherwise empty set definition would look like this:

<?xml version="1.0" encoding="UTF-8"?>
 xml:id="your-set-id" type="closed" label="Human readable label for your set">

Note that the legacy XML format takes an XML namespace that is always the same (the FoLiA namespace).

In RDF, FoLiA Set Definitions follow a particular model. The model we use is a small superset of the SKOS model. SKOS is a W3C standard for the representation of Simple Knowledge Organization Systems [SKOS]. Not everything can be expressed in the SKOS model, so we have some extensions to it which are formally defined in our set definition schema at The RDF namespace for our extension is, for which we use the prefix fsd: generally, though this is mere convention.

Some familiarity with RDF and Turtle is recommended for this chapter, but it is also still possible to work with the XML legacy format, which is a bit more concise and simple, and automatically convert it to Turtle format using our superset of the SKOS model.

Your own set definitions typically has its own RDF namespace, which in Turtle syntax is defined by the @base directive at the top of your set definition.


Never reuse the SKOS or FoLiA Set Definition namespaces!

@base <http://your/namespace/> .
@prefix skos: <> .
@prefix fsd: <> .

SKOS uses a different terminology than we do, which may be the source of some confusion. We attempt to map the terms in the following table:

Our term SKOS SKOS class
Set/Subset ID Collection Notation skos:Collection skos:notation

After this preamble, we can define a set as follows:

    a skos:Collection ;
    skos:notation   "your-set-id" ;
    skos:prefLabel  "Human readable label for your set" ;
    fsd:open        false .

The first two lines state that http://your/namespace/#your-set-i is a [1] SKOS Collection, which is what we use for FoLiA Sets. The skos:notation property corresponds to the ID of the Set, only one is allowed [2] .

A set can be either open or closed (default), an open set allows any classes, even if they are not defined. This can be used for open vocabularies. The fsd:open property is used to indicate this, it is not part of SKOS but an extension of ours, hence the different namespace prefix.


[RDF]Richard Cyganiak, David Wood and Markus Lanthaler (2014). RDF 1.1 Concepts and Abstract Syntax (website)
[SKOS]Alistair Miles & Sean Bechhofer (2009). SKOS: Simple Knowledge Organization System Reference (website)


[1]the a in Turtle syntax is shorthand for rdf:type
[2]Technically, SKOS allows multiple, but we restrict it for Set Definitions.


A set (collection in SKOS terms) consists of classes (concepts in SKOS terms). Consider a simple part-of-speech set with three classes. First we define the set and refer to all the classes it contains:

    a skos:Collection ;
    skos:notation   "simplepos" ;
    skos:prefLabel "A simple part of speech set" ;
    skos:member <#N> , <#V> , <#A> ;

Then we define the classes:

    a skos:Concept ;
    skos:notation   "N" ;
    skos:prefLabel "Noun" .

    a skos:Concept ;
    skos:notation   "V" ;
    skos:prefLabel "Verb" .

    a skos:Concept ;
    skos:notation   "A" ;
    skos:prefLabel "Adjective" .

The ID (skos:notation) of the class is mandatory for FoLiA Set Definitions and determines a value the class attribute may take in the FoLiA document, for elements of this set. The skos:prefLabel property, both on the set itself as well as the classes, carries a human readable description for presentational purposes, this is optional but highly recommended.

In our legacy set definition format this is fairly straightforward and more concise:

  xml:id="simplepos" type="closed"
  label="Simple Part-of-Speech">
  <class xml:id="N" label="Noun" />
  <class xml:id="V" label="Verb" />
  <class xml:id="A" label="Adjective" />

Class Hierarchy

In FoLiA Set Definitions, classes can be nested to create more complex hierarchies or taxonomy trees, in which both nodes and leaves act as valid classes. This is best illustrated in our legacy XML format first. Consider the following set definition for named entities, in which the location class has been extended into more fine-grained subclasses.

<set xml:id="namedentities" type="closed">
  <class xml:id="per" label="Person" />
  <class xml:id="org" label="Organisation" />
  <class xml:id="loc" label="Location">
    <class xml:id="" label="Country" />
    <class xml:id="loc.street" label="Street" />
    <class xml:id="loc.building" label="Building">
      <class xml:id="" label="Hospital" />
      <class xml:id="" label="Church" />
      <class xml:id="loc.building.station" label="Station" />

In the SKOS model, this is more verbose as the hierarchy has to be modelled explicitly using the skos:broader property, as shown in the following excerpt:

    a skos:Collection ;
    skos:member <#loc> , <> .

    a skos:Concept ;
    skos:notation   "loc" ;
    skos:prefLabel "Location" .

    a skos:Concept ;
    skos:notation   "" ;
    skos:prefLabel "Country" ;
    skos:broader <#loc> .

It is recommended, but not mandatory, to set the class ID (skos:notation) of any nested classes to represent a full path, as a full path makes substring queries possible. FoLiA, however, does not dictate this and neither does it prescribe a delimiter for such paths, so the period in the above example ( is merely a convention. Each ID, however, does have to be unique in the entire set.


The section on Features introduced subsets. Please ensure you are familiar with this notion before continuing with the current section.

Subset can be defined in a similar fashion to sets. Consider the legacy XML format first:

<set xml:id="simplepos" type="closed">
  <class xml:id="N" label="Noun" />
  <class xml:id="V" label="Verb" />
  <class xml:id="A" label="Adjective" />
  <subset xml:id="gender" class="closed">
      <class xml:id="m" label="Masculine" />
      <class xml:id="f" label="Feminine" />
      <class xml:id="n" label="Neuter" />

In RDF, subsets are defined as SKOS Collections, just like the primary set. The primary set refers to the subsets using the same skos:member relation as is used for classes/concepts.

    a skos:Collection ;
    skos:member <#N> , <#V> , <#A> , <#gender> .

    a skos:Collection ;
    skos:notation   "gender" ;
    skos:member <#gender.m> .

    a skos:Concept ;
    skos:notation   "m" ;
    skos:prefLabel "Location" ;

Note that in this example, we prefixed the resource name for the class (#gender.m instead of #m). This is just a recommended convention as URIs have to be unique and we may want to re-use the m ID in other subsets as well. The ID in the skos:notation property does not need to carry this prefix, as it needs only be unique within the subset. This property always determines how it is referenced from the FoLiA document, so we would still get <feat subset="gender" class="m" />


SKOS allows for more expressions to be made, and of course the full power of open linked data is available up to be used with FoLiA Set Definitions. The previous subsections layed out the minimal requirements for FoLiA Set Definitions using the SKOS model.

The use of skos:OrderedCollection is currently not supported yet, skos:Collection is mandatory. Ordering of classes (SKOS Concepts) can currently be indicated through a separate fsd:sequenceNumber property.

FoLiA Set Definitions must be complete, that is to say that all sets (SKOS collections) and classes (SKOS concepts) must be fully defined in one and the same set definition file.


The file need not be static but can be dynamically generated server-side; which must be publicly available from a URL. A set definition must contain one and only one primary set (SKOS collection), all other sets must be subsets (SKOS collections that are a member of the primary set, no deeper nesting is supported).