Example Corpus of Language Registers

The Example Corpus of Language Registers is a small corpus of Welsh texts collected manually from the Cysill Ar-lein corpus.  The corpus has been manually anonymized and divided into sentence-length segments. The segments were then classified by the researcher according to their language register.


The catgegories used when classifying the text were:

  • Hynafol (Archaic)
  • Clasurol (Classical)
  • Ffurfiol (Formal)
  • Technegol (Technical)
  • Niwtral (Neutral)
  • Iaith Symledig / Cymraeg Clir (Simplified Language / ‘Plain Welsh’)
  • Anffurfiol (Informal)
  • Anffurfiol iawn / llafar (Very informal / colloquial speech)
  • Tafodieithol (Dialectal)
  • Sathredig (Vulgar)

A typology matrics of language registers was also created to facilitate the task of classifying the texts. The matrix can be found here.