CONTROLLED VOCABULARIES, THESAURI and LEXICAL ONTOLOGIES ΕΛΕΓΧΟΜΕΝΑ ΛΕΞΙΛΟΓΙΑ, ΘΗΣΑΥΡΟΙ και ΛΕΞΙΚΕΣ ΟΝΤΟΛΟΓΙΕΣ Stella Markantonatou Institute for Language and Speech Processing (ILSP) Athena Recearch Center Talk given in UoA Kindly hosted by Prof. K. Nikiforidou
CONTROLLED VOCABULARIES & THESAURI
Κότα; Κόττα; Όρνιθα; Όρνις; Κοτόπουλο; Κόκορας; Kόκκορας; Αλέκτωρ; Κλώσα; Κλωσόπουλο; Αυγό; Κοτέτσι; Ορνιθώνας; Κλωσάω; Κακαρίζω; Κακάρισμα;
A controlled vocabulary is an organized arrangement of words and phrases that are used to index content to retrieve content through navigation or a search to develop texts in a controlled language typically includes preferred terms has a limited scope or describes a specific domain
Terms in a controlled vocabulary denote unique concepts have distinct senses that do not overlap are at the same conceptual level belong to the same part of speech are organised in some way that makes sense, eg. alphabetically (1) is a controlled vocabulary αετός, αηδόνι, κότα, παγώνι, σουσουράδα, σπίνος, σπουργίτι (2) is not a controlled vocabulary αετός, ταώς, αηδόνι, αρπακτικά, όρνιθα, κότα, παγώνι, πετώ, σουσουράδα, σπουργίτι, σπίνος
Preferred term, Descriptor Preferred term: The term designated among all synonyms or lexical variants for a concept to be used as the default term to represent the concept Descriptor: the term recommended to represent the concept (1) controlled vocabulary αετός, αηδόνι, όρνιθα, παγώνι, σουσουράδα, σπίνος, σπουργίτι
Do we like controlled vocabularies? Yes They bring order to chaos They support applications that work in welldefined ( controlled ) environments No In real world applications control may be costly
Όρνιθα, κότα, κοτόπουλο;;; H φωνακλού όρνιθα Ζαχαρίας Παπαντωνίου Μια κότα κακαρίζει απ την αυγή κι ανήσυχα γυρίζει μες την αυλή Γαστρονόμος Συνταγές http://www.kathimerini.com.cy/index.php?page action=kat&modid=1&artid=117131 ΥΛΙΚΑ (για 4-6 άτοµα) 1 όρνιθα, περίπου 2 κιλά (σε καλά κρεοπωλεία) ή 1 µεγάλο κοτόπουλο, κατά προτίµηση βιολογικό Some flexibility would be welcome
Α thesaurus is a semantic network of unique concepts monolingual or multilingual may encode three types of relationship: Equivalence Relationships Hierarchical Relationships Associative Relationships
Preferred term, Descriptor Preferred term: The term designated among all synonyms or lexical variants for a concept to be used as the default term to represent the concept Descriptor: the term recommended to represent the concept Monolingual /multilingual thesaurus: the preferred term is the descriptor A multilingual thesaurus may have multiple descriptors (one in each language represented), but may possibly have only one preferred term for use as default in displays.
Equivalence relationships Κότα; Κόττα; Όρνιθα; Όρνις; Κοτόπουλο; Kόκορας; Kόκκορας; Αλέκτωρ; Κλώσα; Κλωσόπουλο; Αυγό; Κοτέτσι; Ορνιθώνας; Κλωσσάω; Κακαρίζω; Κακάρισμα; Equivalents: either true synonyms or lexical variants of the preferred or another term in the record. Κότα Κόττα; Όρνιθα; Όρνις; Κλώσα; Kόκορας Κόκκορας; Αλέκτωρ; Κοτέτσι; Ορνιθώνας;
Hierarchical relationships: IS A The broader and narrower (parent/child or genus/species or IS A) relationship between terms IS A is the primary feature that distinguishes a thesaurus or taxonomy from controlled vocabularies Hierarchical relationships are referred to by genealogical terms: child, children, siblings, parent, grandparent, ancestors, descendents, etc.
Hierarchical relationships: IS A Κότα; Κόττα; Όρνιθα; Όρνις; Κοτόπουλο; Kόκορας; Kόκκορας; Αλέκτωρ; Κλώσα; Κλωσόπουλο; Αυγό; Κοτέτσι; Ορνιθώνας; Κλωσσάω; Κακαρίζω; Κακάρισμα; Κότα, Κόττα, Όρνιθα, Όρνις Κότα, Κόττα, Όρνιθα, Όρνις Κλώσα Kόκορας, Κόκκορας, Αλέκτωρ Κοτόπουλο Κλωσόπουλο
Multiple Hierarchies When more than one parents are allowed Θερμόαιμα Τροφή Σχέση με άνθρωπο Θηλαστικά Πουλιά Παμφάγα Εξειδικευμένα Άγρια Εξημερωμένα Κότα
Hierarchical relationships: MERONYMY Κότα; Κόττα; Όρνιθα; Όρνις; Κοτόπουλο; Kόκορας; Kόκκορας; Αλέκτωρ; Κλώσα; Κλωσόπουλο; Αυγό; Κοτέτσι; Ορνιθώνας; Κλωσσάω; Κακαρίζω; Κακάρισμα; Τσόφλι; Φτερό; Λειρί; Κρόκος; Ασπράδι; Whole/Part Relationship (partitive relationship): typically body parts, geographic locations Κότα Αυγό Λειρί Τσόφλι
Associative relationships Κότα; Κόττα; Όρνιθα; Όρνις; Κοτόπουλο; Kόκορας; Kόκκορας; Αλέκτωρ; Κλώσα; Κλωσόπουλο; Ωόν Αυγό; Κοτέτσι; Ορνιθώνας; Κλωσσάω; Κακαρίζω; Κακάρισμα; Relations that cross the hierarchical ones Κότα, Κόττα, Όρνιθα, Όρνις lives in Κοτέτσι, Ορνιθώνας Κότα, Κόττα, Όρνιθα, Όρνις gives birth to Αυγό, Ωόν
ENOUGH WITH HENS! LET S MOVE TO ART (beware of hens, they can always creep in)
Getty Vocabularies (Thesauri) The Art & Architecture Thesaurus (AAT) Catherine wheel or rose window? terms, descriptions, for generic concepts related to art, architecture, conservation, archaeology, and other cultural heritage. The Getty Thesaurus of Geographic Names (TGN) Thebes or Diospolis? names, descriptions, for extant and historical cities, empires, archaeological sites, and physical features linked to GIS, maps, and other geographic resources. The Cultural Objects Name Authority (CONA) Mona Lisa or La Gioconda? titles, attributions, depicted subjects about works of art, architecture, and other cultural heritage, both extant and historical, linked to museum collections, special collections, archives, libraries, scholarly research, and other resources. CONA is linked to the AAT, TGN, and ULAN. The Union List of Artist Names (ULAN) Titian or Tiziano Vecellio? names, biographies, related people, about artists, architects, firms, studios, museums, patrons, sitters, and other people and groups involved in the creation and study of art and architecture.
NATIONAL GALLERY (ΕΘΝΙΚΗ ΠΙΝΑΚΟΘΗΚΗ)
ΕΘΝΙΚΗ ΠΙΝΑΚΟΘΗΚΗ Λάδι σε ύφασμα Λάδι σε καμβά Μολύβι σε λαδόχαρτο AAT Oil paint on Cloth Oil paint on Canvas *Pencil on Waxed Paper Charcoal pencil
EXERCISE
EXERCISE Of the two terms, namely silverware and «ασημικό», which is the broader term and which the narrower? How would you modify AAT s structure in order to accommodate Greek?
I WARNED YOU, HENS DO CREEP IN Κότα; Κόττα; Όρνιθα; Όρνις; Κοτόπουλο; Κόκορας; Kόκκορας; Αλέκτωρ; Κλώσα; Κλωσόπουλο; Ωόν; Αυγό; Κοτέτσι; Ορνιθώνας; Κλωσάω; Κακαρίζω; Κακάρισμα;
LEXICAL ONTOLOGIES
WordNet WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept Synsets are interlinked by means of conceptual-semantic and lexical relations The result is a network of meaningfully related words and concepts WordNet superficially resembles a thesaurus WordNet interlinks specific senses of words words in close proximity in the network are semantically disambiguated WordNet has a large set of labels for the semantic relations among words https://wordnet.princeton.edu/
Κότα; Κόττα; Όρνιθα; Όρνις; Κοτόπουλο; Κόκορας; Kόκκορας; Αλέκτωρ; Κλώσα; Κλωσόπουλο; Αυγό; Κοτέτσι; Ορνιθώνας; Κλωσάω; Κακαρίζω; Κακάρισμα;
FrameNet If we do not know that a dog barks, we cannot find bark starting from dog but we can find dog starting from bark, although no automatic access is provided the two lemmas are not related in the lexicon systematically dog.n Frame: Animals Definition: FN: a usually domesticated mammal descended from wolves
Θεολόγος Βοσταντζόγλου, 1962 The Greek counterpart of Roget s «ΟΝΟΜΑΣΤΙΚΟΝ» The relations network is more dense and semantic fields are better outlined, but still, κότα and κακαρίζω are not related A matter of printed material constraints? People today talk about CONCEPTUAL ORGANISATION of lexica but, still, synonymy and antonymy are the best presented relations
Find the terms-organise them When electronical lexicography meets Natural Language Generation:
New ways οf exploiting ontologies The text on marriage cauldrons was not produced by humans! It uses templates of phrases and thesauri!
ELEON http://www.iit.demokritos.gr/~eleon/eleondownloads.html
Bibliography http://babelnet.org/ https://framenet.icsi.berkeley.edu/fndrupal/ http://www.roget.org/ https://wordnet.princeton.edu/ Βοσταντζόγλου, Θεολόγος. 1962. Αντιλεξικόν ή Ονομαστικόν της Νέας Ελληνικής Γλώσσης. Αθήνα Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. Language, Speech and Communication. MIT Press,1998 Hullen, Werner. 2004. The History of Roget s Thesaurus. Oxford University Press
https://repository.k allipos.gr/handle/11 419/2205
For an online version of the 2010 edition of Introduction to Controlled Vocabularies, see www.getty.edu/research/publications/electronic_publications/intro_controlled_vo cab/