The Example Corpus of Language Registers is a small corpus of Welsh texts collected manually from the Cysill Ar-lein corpus. The corpus has been manually anonymized and divided into sentence-length segments. The segments were then classified by the researcher according to their language register.
The catgegories used when classifying the text were:
- Hynafol (Archaic)
- Clasurol (Classical)
- Ffurfiol (Formal)
- Technegol (Technical)
- Niwtral (Neutral)
- Iaith Symledig / Cymraeg Clir (Simplified Language / ‘Plain Welsh’)
- Anffurfiol (Informal)
- Anffurfiol iawn / llafar (Very informal / colloquial speech)
- Tafodieithol (Dialectal)
- Sathredig (Vulgar)
A typology matrics of language registers was also created to facilitate the task of classifying the texts. The matrix can be found here.