CLTW |

4^th Celtic Language Technology Workshop

Co-located with LREC 2022 in Marseille, France

Programme

Monday 20 June, 2022

Palais du Pharo, Joliette (map)

09:00–09:10	Welcome CLTW organisers
09:10–09:50	Keynote Prof Kevin Scannell, Saint Louis University
	09:50–10:15 Oral Session 1
09:50–10:15	Multilingual Abstract Meaning Representation for Celtic Languages Johannes Heinecke and Anastasia Shimorina
	10:15–11:00 Coffee Break / Poster Session (Phar’Club area, 1st floor, map)
	11:05–11:55 Oral Session 2
11:05–11:30	BU-TTS: An Open-Source, Bilingual Welsh-English, Text-to-Speech Corpus Stephen Russell, Dewi Jones and Delyth Prys
11:30–11:55	Developing Automatic Speech Recognition for Scottish Gaelic Lucy Evans, William Lamb, Mark Sinclair and Beatrice Alex
	11:55–13:00 Oral Session 3
11:55–12:20	Handwritten Text Recognition (HTR) for Irish-Language Folklore Brian Ó Raghallaigh, Andrea Palandri and Críostóir Mac Cárthaigh
12:20–12:45	AAC don Ghaeilge: the Prototype Development of Speech-Generating Assistive Technology for Irish (*BEST PAPER AWARD*) Emily Barnes, Oisín Morrin, Ailbhe Ní Chasaide, Julia Cummins, Harald Berthelsen, Andy Murphy, Muireann Nic Corcráin, Claire O’Neill, Christer Gobl and Neasa Ní Chiaráin
12:45–13:00	Valedictory Session CLTW organisers

Accepted submissions

The workshop proceedings are out now!

Some authors have provided us with pdfs of posters/slides; links are given after the paper title in that case.

• AAC DON GHAEILGE: THE PROTOTYPE DEVELOPMENT OF SPEECH-GENERATING ASSISTIVE TECHNOLOGY FOR IRISH (BEST PAPER AWARD)
- Emily Barnes, Oisín Morrin, Ailbhe Ní Chasaide, Julia Cummins, Harald Berthelsen, Andy Murphy, Muireann Nic Corcráin, Claire O’Neill, Christer Gobl and Neasa Ní Chiaráin
• AUTOMATIC SPEECH RECOGNITION FOR IRISH: THE ABAIR-ÉIST SYSTEM
- Liam Lonergan, Mengjie Qian, Harald Berthelsen, Andy Murphy, Christoph Wendler, Neasa Ní Chiaráin, Christer Gobl and Ailbhe Ní Chasaide
• BU-TTS: AN OPEN-SOURCE, BILINGUAL WELSH-ENGLISH, TEXT-TO-SPEECH CORPUS
- Stephen J. Russell, Dewi Jones and Delyth Prys
• CELTIC CALL: STRENGTHENING THE VITAL ROLE OF EDUCATION FOR LANGUAGE TRANSMISSION
- Neasa Ní Chiaráin, Madeleine Comtois, Oisín Nolan, Neimhin Robinson-Gunning, John Sloan, Harald Berthelsen and Ailbhe Ní Chasaide
• CIPHER – FAOI GHEASA: A GAME-WITH-A-PURPOSE FOR IRISH [poster]
- Elaine Uí Dhonnchadha, Monica Ward and Liang Xu
• CLILSTORE.EU – A MULTILINGUAL ONLINE CLIL PLATFORM
- Caoimhín Ó Dónaill
• CREATION OF AN EVALUATION CORPUS AND BASELINE EVALUATION SCORES FOR WELSH TEXT SUMMARISATION
- Mahmoud El-Haj, Ignatius Ezeani, Jonathan Morris and Dawn Knight
• DEVELOPING AUTOMATIC SPEECH RECOGNITION FOR SCOTTISH GAELIC
- Lucy V. Evans, William Lamb, Mark Sinclair and Beatrice Alex
• DEVELOPMENT AND EVALUATION OF SPEECH RECOGNITION FOR THE WELSH LANGUAGE [poster]
- Dewi Jones
• DIACHRONIC PARSING OF PRE-STANDARD IRISH
- Kevin Scannell
• EVALUATION OF THREE WELSH LANGUAGE POS TAGGERS [poster]
- Gruffudd Prys and Gareth Llewellyn Watkins
• HANDWRITING RECOGNITION FOR SCOTTISH GAELIC
- William Lamb, Beatrice Alex and Mark Sinclair
• HANDWRITTEN TEXT RECOGNITION (HTR) FOR IRISH-LANGUAGE FOLKLORE
- Brian Ó Raghallaigh, Andrea Palandri and Críostóir Mac Cárthaigh
• INTRODUCING THE NATIONAL CORPUS OF IRISH PROJECT
- Mícheál J. Ó Meachair, Úna Bhreathnach and Gearóid Ó Cleircín
• ITERATED DEPENDENCIES IN A BRETON TREEBANK AND IMPLICATIONS FOR A CATEGORIAL DEPENDENCY GRAMMAR [poster]
- Annie Foret, Denis Béchet and Valérie Bellynck
• MULTILINGUAL ABSTRACT MEANING REPRESENTATION FOR CELTIC LANGUAGES [slides]
- Johannes Heinecke and Anastasia Shimorina
• TOWARDS COREFERENCE RESOLUTION FOR EARLY IRISH
- Mark Darling, Marieke Meelen and David Willis
• USE OF TRANSFORMER-BASED MODELS FOR WORD-LEVEL TRANSLITERATION OF THE BOOK OF THE DEAN OF LISMORE [poster]
- Edward Gow-Smith, Mark McConville, William Gillies, Jade Scott and Roibeard Ó Maolalaigh

Outline

Language Technology and Computational Linguistics research innovations in recent years have given us a great deal of modern language processing tools and resources for many languages. Basic language tools like spell and grammar checkers through to interactive systems like Siri, as well as resources like the Trillion Word Corpus, all fit together to produce products and services which enhance our daily lives.

Until relatively recently, languages such as Celtic languages, with smaller numbers of speakers, have largely not benefited from attention in this field. However, modern techniques in the field are making it easier to create language tools and resources from fewer resources in a faster time. In this light, many lesser-spoken languages are making their way into the digital age through the provision of language technologies and resources.

The Celtic Language Technology Workshop (CLTW) series of workshops provides a forum for researchers interested in developing natural language processing (NLP) resources and language technologies (LTs) for Celtic languages. As Celtic languages are under-resourced, our goal is to encourage collaboration and communication between researchers working on language technologies and resources for Celtic languages.

Areas of Interest

Our workshop welcomes theoretical and practical submissions on any Celtic language (Irish, Welsh, Scottish Gaelic, Manx, Cornish or Breton) that contributes to research in machine translation, automated language processing, language/speech technologies or resources for the same. We’ve seen from previous CLTWs that there is much scope for sharing best practices and leveraging from learned experiences through working with limited resources in this forum. We will particularly encourage studies that address either practical applications with a human in the loop or the lack of resources available for a given language in this field.

Topics of interest for the CLTW workshop include but are not limited to:

• Celtic Language Resources
• Syntax
• Semantics
• Lexicons
• Supervised and semi-supervised annotation of Celtic-language texts (e.g. POS and morphological tagging)
• Computer Assisted Language Learning (CALL)
• Machine Translation
• Parsing/Chunking
• Terminology and Knowledge Representation
• Speech Processing/Generation
• Celtic Digital Humanities
• Celtic Corpus Development/Analysis
• Treebanking
• Evaluation Methods
• Ontology-lexica
• Linked Data Resources
• Information Extraction
• Transfer Learning
• Cross-lingual Methods
• NLP/LT for historical Celtic languages

Invited Speaker

• Professor Kevin Scannell (St Louis University, USA)

Important Dates

• Submission deadline: ~~8 April~~ 15 April 2022 (AoE)
• Notification of acceptance: 3 May 2022
• Camera-ready deadline: 23 May 2022

Papers can be 4 (short) or 8 pages (long). Make your submission at: https://www.softconf.com/lrec2022/CLTW2022/. Papers will be selected for oral delivery or as posters according to the most suitable method for the subject matter. There will be no difference in the way they are treated in the published proceedings. Please note that LREC 2022 solicits FULL paper submissions; for paper templates consult the author’s kit provided. Further details at the submission portal.

Identify, describe and share your Language Resources!

As can be read at https://lrec2022.lrec-conf.org/en/submission2022/share-your-lrs/:

Describing your Language Recources (LRs) in the LRE Map is now a normal practice in the submission procedure of LREC (introduced in 2010 and adopted by other conferences). To continue the efforts initiated at LREC 2014 about “Sharing LRs” (data, tools, web-services, etc.), authors will have the possibility, when submitting a paper, to upload LRs in a special LREC repository. This effort of sharing LRs, linked to the LRE Map for their description, may become a new “regular” feature for conferences in our field, thus contributing to creating a common repository where everyone can deposit and share data.

As scientific work requires accurate citations of referenced work so as to allow the community to understand the whole context and also replicate the experiments conducted by other researchers, LREC 2022 endorses the need to uniquely Identify LRs through the use of the International Standard Language Resource Number (ISLRN, www.islrn.org), a Persistent Unique Identifier to be assigned to each Language Resource. The assignment of ISLRNs to LRs cited in LREC papers will be offered at submission time.

Provisions for Virtual Delivery

If it proves necessary to hold the workshop online, the organisers will use Zoom and ensure that all available channels (e.g. social media and institutional websites / mailing lists) are used to publicise the event. We will use the interactive environment www.gather.town for running the poster sessions.

Programme Committee

• Beatrice Alex (University of Edinburgh)
• Colin Batchelor (Royal Society of Chemistry)
• Ann Foret (Université Rennes 1)
• John Judge (ADAPT/ Dublin City University)
• Teresa Lynn (Dublin City University)
• Mark McConville (University of Glasgow)
• John P. McCrae (National University of Ireland, Galway)
• Marieke Meelen (University of Cambridge)
• Ailbhe Ní Chasaide (Trinity College Dublin)
• Neasa Ní Chiaráin (Trinity College Dublin)
• Brian Ó Raghallaigh (Fiontar/ Dublin City University)
• Thierry Poibeau (Laboratoire Lattice, CNRS – École Normale Supérieure, Sorbonne Nouvelle)
• Kevin Scannell (Saint Louis University)
• Elaine Uí Dhonnchadha (Trinity College Dublin)
• Monica Ward (Dublin City University)
• Pauline Welby (Laboratoire Parole et Langage (LPL), CNRS – Aix Marseille Université)
• David Willis (University of Oxford)

Organisers

• Prof Delyth Prys (Bangor University, d.prys@bangor.ac.uk)
• Dr William Lamb (University of Edinburgh, w.lamb@ed.ac.uk)
• Dr Theodorus Fransen (National University of Ireland, Galway, theodorus.fransen@nuigalway.ie)

4th Celtic Language Technology Workshop