Gaelic/Using Corpas na Gàidhlig

From Celtic Languages
Revision as of 16:58, 20 November 2022 by Silmeth (talk | contribs) (Created page with "[https://dasg.ac.uk/corpus/ ''Corpas na Gàidhlig''] (the ''DASG corpus'') is a publicly available corpus of Scottish Gaelic literary texts (from 1200 to 21st century, though...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Corpas na Gàidhlig (the DASG corpus) is a publicly available corpus of Scottish Gaelic literary texts (from 1200 to 21st century, though most texts are modern, ie. 19th century and later) – books, newspapers, poems, advertisements. It will possibly be expanded with non-literary texts later too (eg. transcriptions of recorded folk tales).

The interface of the corpus uses the open source Corpus Workbench (CWB) software and a custom modification of the CQPweb web interface.

The texts included in the corpus are annotated with information about the time priod they’re from, the literary type of the work, its author, etc. Unfortunately, the words are not annotated with part-of-speech tags and there’s no meta-information about structure of the sentences which limits somehow queries that are possible. Still, the interface allows users to use wildcards in the queries, use the CQP query syntax to make complex queries.

TODO: fill the rest