Difference between revisions of "Irish/Using Nua-Chorpas na hÉireann"

From Celtic Languages
Jump to navigationJump to search
Line 5: Line 5:
== First steps ==
== First steps ==
To use the corpus, you first have to create an account using the [https://focloir.sketchengine.co.uk/run.cgi/register_form registration form]. Registration is free, but you will have to wait until your account is accepted before you’ll be able to log in and use the corpus.
To use the corpus, you first have to create an account using the [https://focloir.sketchengine.co.uk/run.cgi/register_form registration form]. Registration is free, but you will have to wait until your account is accepted before you’ll be able to log in and use the corpus.
=== Old and new interface ===


When you log in, you’ll see the old Sketch Engine web interface. You can use it but it is also possible to access the new interface by logging into [https://focloir.sketchengine.eu focloir.sketchengine.eu] instead. The new interface is generally much more user-friendly (and compatible with the official Sketch Engine documentation) but '''beware''': some features don’t work with it (for example ''word sketches'' work in the old interfaces, but they don’t in the new one).
When you log in, you’ll see the old Sketch Engine web interface. You can use it but it is also possible to access the new interface by logging into [https://focloir.sketchengine.eu focloir.sketchengine.eu] instead. The new interface is generally much more user-friendly (and compatible with the official Sketch Engine documentation) but '''beware''': some features don’t work with it (for example ''word sketches'' work in the old interfaces, but they don’t in the new one).

Revision as of 17:43, 5 November 2022

The New Corpus for Ireland or Nua-Chorpas na hÉireann is a very useful tool for checking how some things are phrased in Irish and which expressions are used by native speakers and which ones are not. Unfortunately the corpus’s help page is not accessible and the UI isn’t very user-friendly. One can find some documentation for the software used there, but it’s not corpas-specific and thus not very helpful when working with this particular corpus of Irish.

This page isn’t meant to be a comprehensive documentation of the corpus, but at least a list of hints that would make your work with the corpus a bit more efficient. For more comprehensive documentation, see [#External documentation] below.

First steps

To use the corpus, you first have to create an account using the registration form. Registration is free, but you will have to wait until your account is accepted before you’ll be able to log in and use the corpus.

Old and new interface

When you log in, you’ll see the old Sketch Engine web interface. You can use it but it is also possible to access the new interface by logging into focloir.sketchengine.eu instead. The new interface is generally much more user-friendly (and compatible with the official Sketch Engine documentation) but beware: some features don’t work with it (for example word sketches work in the old interfaces, but they don’t in the new one).

You can follow this guide in the old interface, unless it refers explicitly to the new one.

Simple querying

TODO

Filtering the results

TODO

CQL

The Corpus Query Language (CQL) allows you to make complex regex-like queries, including things like looking for phrases containing specific parts of speech or inflectional forms – that’s possible because every word in the corpus is tagged with information about its part-of-speech and inflectional form. Using CQL is more complex than simple searching for words, but it enables you to be much more flexible in your searches.

TODO

External documentation

  • list of tags available in the corpus
  • Sketch Engine User Guide – a guide to newer version of the software the Corpas is using. The graphical interface presented in the guide is completely different to what you’ll find on corpas.focloir.ie, but the principles described there will generally be valid for the Corpas too. Among things you’ll find there are: