Hellenic National Corpus (HNC)™ Ελληνικά  English  

LoginLogin   Sign-inSign-in

Hellenic National Corpus of Greek Language (HNC)™

ANNOUNCEMENTS

(30/10/2020) The Golden Corpus of HNC, which is a subset of HNC, is about 100.000 words and is available for downloading from the Infrastructure for Language Resources CLARIN:EL. All the words have been automatically annotated in regard to their part of speech and their morphological identity. The annotation has been corrected manually by a linguist, so the result is error free.

(1/6/2020 Since June 2020 HNC has been available for free for research purposes for all users. Registration is required for full use of all HNC functions.

Hellenic National Corpus (HNC) is a platform which has been developed by the Institute for Language and Speech Processing/ R.C Athena which offers to language researchers language material (corpus) and computational tools for its processing.

What does HNC offer?

  • HNC consists of written texts with more than 70.000.000 words. More details about the text can be found in the Information section.
  • The user can:
    • use all the language materials or select texts and define a subset with the use of filters.
    • search for language materials with the use of specific words, lemmas or parts of speech. The system retrieves sentences which comply to the search criteria.
    • analyse a word and obtain information for the frequency of the specific word, its syntax etc.
    • see information about the co-occurrence of two words.

Above functionalities are available for registered users. Further details for the analysis tools can be found in the Help section.

The Golden Tagged Corpus

The Golden Corpus of HNC, which is a subset of HNC, is about 100.000 words. All the words have been automatically annotated in regard to their part of speech and their morphological identity (as are all HNC texts). In addition, the Golden Corpus annotation and lemmatization has been corrected manually by a linguist, so the result is error free.

The HNC Golden Corpus is available for free through the Infrastructure for Language Resources and Technologies CLARIN:EL, in two forms:

"Εθνικός Θησαυρός Ελληνικής Γλώσσας"™ and "Hellenic National Corpus"™ are Trademarks (™) of Institute for Language and Speech Processing.


x
This site is using cookies to ensure that we give you the best experience on our website. If you continue by pressing the "Accept" button, we assume that you consent to receive all cookies on HNC
Privacy Policy Cookie use Accept