Data Fusion and Separation Meeting June 24-26, 2001, Carnuntum, Austria A multi-scale, multi-context ontology for fusion and fission Margarita Kokla & Marinos Kavouras National Technical University of Athens Central notion: multi-scale, multi-context ontology Extend the notion of multi-scale data to include, except from different levels of detail, different conceptualizations of geographic entities. fusion of heterogeneous ontologies fission: production of ontologies for specific uses generalization 2 1
Integration and ontology research Ontologies play important role in information integration. A top-level ontology may provide the framework for integration (Guarino, 1998; Sowa 2000). Fusion of different geographic domain ontologies (SDTS) with top-level ontologies (CYC, WordNet) for information exchange and reuse (Kokla & Kavouras, 2001) Diversity of existent top-level ontologies (CYC, WordNet, Mikrokosmos) Solution: embody theories of geographic information cognition and human categorization (Smith & Mark, 2000) 3 Principles of categorization and dimensions of categorical systems (Rosch, 1976) Principles of categorization: cognitive economy perceived world structure Dimensions of categorical systems: horizontal dimension: internal structure of categories vertical dimension: level of abstraction Vertical level of abstraction Categorical systems Horizontal internal structure categories are conceived in terms of their clear cases rather than their boundaries basic level categorization = the most inclusive level with attributes common to all or most members of the category 4 2
Empirical evidence for the basic level 9 taxonomies, e.g., tree, bird, furniture - 3 levels of abstraction (attributes in common, motor movements in common, similarity in shape) (Rosch, 1976) 3 taxonomies (artificial and natural categories) - 3 levels of abstaction Superordinate Basic Level Subordinate Transportation Road Network National road Provincial road Street Railway Airport Port Cultivated area Trees Olive groves Vineyards Citrus fruits Arable land Irrigated Non-irrigated Natural area Forest Broad-leaved forest Coniferous forest 5 Basic level categories minimize ambiguity and maximize comprehension. increase similarity, simplicity and commonality in user interaction (accessibility to a wider range of users). help to resolve conflicts during the integration of complex categories. 6 3
Integration process analysis of entity types-classes, attributes: identification of heterogeneities in definitions and relationships between classes (equivalence, overlap, etc.) semantic factoring correspondences between attributes creation of the integrated ontology 7 Two projects 1. Integration of: CORINE Land Cover nomenclature for scales 1:100,000 1:1,000,000 Cadastral classification of land use characteristics developed by the Hellenic Mapping & Cadastral Organization referring to scales 1:1,000 1:5,000 2. Definition of new land use/cover categories for conducting the 2001 agricultural census by Hellenic Statistical Service, associated with: the former classification for conducting the 1991 agricultural census CLUSTERS (Classification of Land Use Statistics Eurostat Remote Sensing Programme) CORINE Land Cover nomenclature 8 4
Semantic factoring decomposition of overlapping classes into fundamental, disjoint classes which: constitute the most clear, unambiguous and coherent classes (elementary classes or building blocks of the categorization) reflect the consensus across different conceptualizations of geographic entities revelation of basic level categories during integration Industrial or commercial units Tertiary sector commerce 9 Semantic factoring the levels above and beneath the basic level result from synthesis and analysis correspondingly. subordinate level: specialization of basic level, includes expert knowledge superordinate level: abstract, usually artificial classes, e.g., «forests and semi-natural areas» (CLC) heterogeneity may occur as a result of different conceptualizations of space, e.g., land cover perspective (artificial surfaces, agricultural areas, waterbodies) vs. economic perspective (primary, secondary, tertiary sector). 10 5
Semantic Factoring (extraction of basic-level categories) CORINE Land Cover Industrial or commercial units Industrial, commercial and transport units Transport Technical and transport infrastructures CLUSTERS Technical Infrastructures Industrial units Basic categories Commercial units Artificial, non-agricultural vegetated areas Road and rail networks Port areas Airports CORINE Land Cover Hellenic Cadastre Original Categories g 1 g 2 g 3 g 4 g 5 g 6 1.2 Industrial, commercial and transport units x x x x x 1.2.1 Industrial or commercial units x x 1.2.2 Road and rail networks & associated land x 1.2.3 Port areas x 1.2.4 Airports x 1.4 Artificial, non-agricultural vegetated areas x 2 Secondary Sector x 3 Tertiary sector x x 5 Transportation x x x 11 Correspondence of attributes Attributes CORINE Land Cover Hellenic Cadastre Land value Level of environmental impact Branch of economic activity Max velocity Sailing inti harbor Aerport category Property status Number of establishments Mean annual employment Slope Original Categories m 1 m 2 m 3 m 4 m 5 m 6 m 7 m 8 m 9 m 10 1.2 Industrial, commercial and transport units x x 1.2.1 Industrial or commercial units x x x 1.2.2 Road and rail networks and associated land x x x 1.2.3 Port areas x x x 1.2.4 Airports x x x 1.4 Artificial, non-agricultural vegetated areas x x 2 Secondary Sector x 3 Tertiary sector x 5 Transportation x 12 6
Cross-table of the integrated context (ascribe attributes to basic-level categories) attributes m 1 m 2 m 3 m 4 m 5 m 6 m 7 m 8 m 9 m 10 Basic categories g 1 x x x x g 2 x x x x g 3 x x x g 4 x x x x g 5 x x x x g 6 x x x x 13 Creation of the integrated categorization INPUT: cross-table of the integrated context OUTPUT: set of final concepts and order relationships modeling of basic categories, attributes, concepts and relationships using Formal Concept Analysis 14 7
Posets and trees An ordered set (or partially ordered set) (P, ) is a set P with an order relation defined on that set. A binary relation on a set P is called an order relation ( ) if for all elements, x, y, z P the following conditions are satisfied: x x (reflexivity) x y and y x implies that x = y (antisymmetry) x y and y z implies that x z (transitivity) In a poset an element may have multiple parents rather than being limited to one as in the case for trees. A poset is therefore a generalization of a tree. 15 Lattices a collection of sets such that for any two overlapping sets in the collection, the intersection of the sets is also in the collection Let P be a partially ordered set. Then: If for any two elements x, y P the least upper bound x y and greatest lower bound x y always exist, then P is called a lattice. If the greatest lower bound S and least upper bound S exist for all S P, then P is called a complete lattice. tree poset lattice 16 8
Concept Lattices Formal Concept Analysis (Wille, 1992) Formal Context: a triple (G, M, I) where G and M are sets of objects and attributes and I is a binary relation between G and M. Incidence relation gim: the object g has the attribute m. Definition: For a set A G of objects and a set B M of attributes we define: A' = {m M gim for all g A} B' = {g G gim for all m B} Formal Concept, Conceptual Class or Category: collection of entities or objects exhibiting one or more common characteristics or attributes. A pair (A, B) is a formal concept of the context (G, M, I) if A G, B M, A=B and B=A, where A is called the extent and B the intent of the formal concept. 17 Concept Lattices Formal Concept Analysis (Wille, 1992) Superconcept/subconcept relation: the concept (A 1, B 1 ) is a subconcept of the concept (A 2, B 2 ) ( (A 1, B 1 ) (A 2, B 2 )), if A 1 A 2 (which is equivalent to B 2 B 1 ). (A 2, B 2 ) is then a superconcept of (A 1, B 1 ). Concept Lattice: the set of all concepts of (G, M, I) ordered by the subconcept-superconcept relation is called the concept lattice of the context (G, M, I) and is denoted by B(G, M, I). Basic Theorem on Concept Lattices: Let (G, M, I) be a context. Then B(G, M, I) is a complete lattice in which the greatest lower bound (meet) and the least upper bound (join) are given by: A t, Bt ) = At, Bt t T t T ( ( t, Bt ) t T t T A = At, t T t T Bt 18 9
Creation of the integrated categorization incorporate multiple relationships creation of extra categories based on the fusion or division of original ones: the least upper bound (join) and the greatest lower bound (meet) are given by definition allow overlap, overcome the rigidity of tree structures matrices in case of many classes and relationships 19 Algorithm for Creating Concept Lattices step 1. Draw the list of object intents or attribute extents: {g}' = {m M gim} {m'} = {g G gim} step 2. Use either the formulas: A t g = { } B t = { m} t T substep 2.1. The intent M is entered into the list. substep 2.m. For each set A entered into the list in an earlier step, we form the set: A' g' and include it in the list, provided that it is not already contained within it. t T 20 10
substep Object intent g' Concept intents A' 1 {m 1,, m 10 } 2 g 1 ' = {m 1, m 2, m 3, m 8 } {m 1, m 2, m 3, m 8 } 3 g 2 ' = {m 1, m 2, m 3, m 9 } {m 1, m 2, m 3, m 9 } {m 1, m 2, m 3 } 4 g 3 ' = {m 1, m 7, m 9 } {m 1, m 7, m 9 } {m 1 } {m 1, m 9 } 5 g 4 ' = {m 1, m 2, m 4, m 10 } {m 1, m 2, m 4, m 10 } {m 1, m 2 } 6 g 5 ' = {m 1, m 2, m 5, m 10 } {m 1, m 2, m 5, m 10 } {m 1, m 2, m 10 } 7 g 6 ' = {m 1, m 2, m 6, m 10 } {m 1, m 2, m 6, m 10 } Formal Concepts of the integrated context C1 = (, {m1,, m10}) (least concept) C2 = ({g1}, {m1, m2, m3, m8}) C3 = ({g2}, {m1, m2, m3, m9}) C4 = ({g1, g2}, {m1, m2, m3}) C5 = ({g3}, {m1, m7, m9}) C6 = ({g1, g2, g3, g4,g5, g6}, {m1}) C7 = ({g2, g3}, {m1, m9}) C8 = ({g4}, {m1, m2, m4, m10}) C9 = ({g1, g2, g4, g5, g6}, {m1, m2}) C10 = ({g5}, {m1, m2, m5, m10}) C11 = ({g4, g5, g6}, {m1, m2, m10}) C12 = ({g6}, {m1, m2, m6, m10}) secondary sector commerce industrial or commercial units artificial, non-agricultural vegetated areas artificial surfaces (largest concept) tertiary sector road and rail networks and associated land industrial, commercial and transport units port areas transportation airports 22 11
Matrix Manipulations (for large contexts) C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C 1 0 1 1 1 1 1 1 1 1 1 1 1 C 2 0 0 0 1 0 1 0 0 1 0 0 0 C 3 0 0 0 1 0 1 1 0 1 0 0 0 C 4 0 0 0 0 0 1 0 0 1 0 0 0 C 5 0 0 0 0 0 1 1 0 0 0 0 0 C 6 0 0 0 0 0 0 0 0 0 0 0 0 C 7 0 0 0 0 0 1 0 0 0 0 0 0 C 8 0 0 0 0 0 1 0 0 1 0 1 0 C 9 0 0 0 0 0 1 0 0 0 0 0 0 C 10 0 0 0 0 0 1 0 0 1 0 1 0 C 11 0 0 0 0 0 1 0 0 1 0 0 0 C 12 0 0 0 0 0 1 0 0 1 0 1 0 Matrix M m ij = 1 if concept C i is subconcept of C j m ij = 0 otherwise C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C 1 0 1 1-1 1-9 -1 1-6 1-2 1 C 2 0 0 0 1 0-1 0 0 0 0 0 0 C 3 0 0 0 1 0-2 1 0 0 0 0 0 C 4 0 0 0 0 0 0 0 0 1 0 0 0 C 5 0 0 0 0 0 0 1 0 0 0 0 0 C 6 0 0 0 0 0 0 0 0 0 0 0 0 C 7 0 0 0 0 0 1 0 0 0 0 0 0 C 8 0 0 0 0 0-1 0 0 0 0 1 0 C 9 0 0 0 0 0 1 0 0 0 0 0 0 C 10 0 0 0 0 0-1 0 0 0 0 1 0 C 11 0 0 0 0 0 0 0 0 1 0 0 0 C 12 0 0 0 0 0-1 0 0 0 0 1 0 Matrix L L = M-M*M l ij = 1 if concept C i is directly below C j 23 Excerpt of the Integrated Concept Lattice C6 ARTIFICIAL SURFACES C9 Industrial, commercial and transport units C11 Trasportation C4 Industrial or commercial units C7 Tertiary sector C8 Road and rail networks C10 Port areas C12 Airpo rts C2 Secondary sector C3 Commerce C5 Artificial, non-agricultural vegetated areas C1 24 12
Excerpt of the Integrated Concept Lattice (project 1) CORINE Land Cover Hellenic Cadastre Common classes 25 Excerpt of the Integrated Concept Lattice (project 1) DIGEST- CADASTRE C1 ARTIFICIAL SURFACES CORINE DIGEST C2 Industrial, commercial and transport units C13 Tertiary sector C5 Artificial, non-agricultural vegetated areas CADASTRE ALL THREE C3 Industrial or commercial units C7 Trasportation C11 Green urban areas C12 Sport and leisure facilities C6 Secondary sector C10 Commerce C4 Road and rail networks C19 Port areas C20 Airports C8 Manufcturing C9 Energy C14 Road network C15 Railway C16 Intersection C17 Terminal C18 Parking Processing Industry Fabrication Associated Industrial Industry Structures 26 13
1.1.1. Συνεχής αστική οικοδό µησ η Ασ τική οικοδό µησ η 1.1.2. Ασυνε χής αστική οικοδόµηση Βιοµηχανικές & εµπορικές ζώ νες 1.2.1. Βιοµηχανικές & εµπορικές ζώ νες 1.2.2. Οδικά & σιδηροδροµ. δίκτυα & γειτνιάζο υσ α γη ίκτυα συγκοινωνιών ε: έκταση που καταλαµβάνει ο οικισµός ή οι οικισµοί ΤΕΧΝΗΤΕΣ ΠΕΡΙΟΧΕΣ 1.2.3. Ζώ νε ς λιµένων 1.2.4 Αε ροδ ρόµια 1.3.1. Χώ ροι εξορύξεως ορυκτών Ορυχεία, χώροι απορρίψεως απ ορριµµάτων & εργοτάξια 1.3.2. Χώ ροι απορρίψεως απορριµµάτων 1.3.3. Χώροι οικοδόµησης Τεχνητές, µη γεωργικές ζώ νες πρασίνου, χώ ροι αθλητικών & πολιτιστικών δραστηριοτήτων 1.4.1. Περιοχές αστικού πρασίνου 1.4.2. Εγκαταστάσεις αθλητισµούαναψυχής 2.1.1. Μη-αρ δεύσιµη αρόσιµη γη ζ: άλλες εκτάσεις (βραχότοπ οι, µεταλλεία, κλπ.) Αρόσιµη γη 2. 1.2. Μόνιµα αρδευόµενη γη 2.1.3. Ορυζώνες 2.2.1. Αµπελώνες α: καλλιεργούµενες εκτάσεις κα ι αγραναπαύσεις από 1-5 ετών Μόνιµες καλλιέργειες 2.2.2. Οπ ωρ οφόρα δέντρα &φυτείες µε σαρκώδεις καρπούς 2.2.3. Ελαιώνες ΓΕΩΡΓΙΚΕΣ ΠΕΡΙΟΧΕΣ 2.3.1. Βοσκοτόπια & λιβαδια Βοσκοτόπ ια & λιβαδια 3.2.1. Φυσικοί βοσκότοποι 2.4.1. 2.4.2. Ετήσιες καλλιέργ. Σύνθετα συστήµατα µε µόνιµες καλλιέργ. β: βοσκότοπο ι (δηµοτικο ί ή κο ινο τικοί και άλλοι) Ετερογενείς γεωργικές περιοχές καλλιέργειας 2.4.3. Γη κυρίωςγεωργική µε σηµαντικές εκτάσεις φυσ ικής βλάστησης 2.4.4. Γεω ργοδασ ικές περιοχές 3.1.1. άσ ος π λατύφυλλω ν άση 3.1.2. άσος κω νοφόρων γ: δάση 3.1.3. Μικτό δάσος Μεταβατικέςδασώδεις - θαµνώδεις εκτάσεις 3.2.4. Μετα βατικέςδασ ώδειςθαµνώ δεις εκτάσεις ΑΣΗ & ΗΜΙΦΥΣΙΚΕΣ ΕΚΤΑΣΕΙΣ Συνδυασµοί θαµνώδους και/ή ποώδους βλάστησης 3.2.2. Θάµνοι & χερσότοποι 3.2.3. Σκληροφυλλική βλάστηση 3.3.1. Παραλίες, αµµόλοφοι, αµµουδιές 3.3.2. Απ ογυµνωµένοι βράχοι Εκτάσεις µε αραιή ή καθόλου βλάστηση 3.3.3 Εκτάσεις µε αραιή βλάστηση 3.3.4. Απ οτεφρωµένες εκτάσεις 5.1.1. Ροές υδά τω ν Χερσαία ύδατα 5.1.2. Συλλογές υδάτων 4.1.1. Βάλτοι στην ενδοχώρα Εσωτερικές υγρές ζώνες δ: εκτάσεις που καλύπτονται από νερά ΕΚΤΑΣΕΙΣ ΠΟΥ ΚΑΛΥΠΤΟΝΤΑΙ ΑΠΟ ΝΕΡΑ 4.1.2. Τυρφώ νε ς 4.2.1. Παραθαλάσ σιοι βάλτοι 4. 2.2. Αλυκές Παραθαλά σσιες υγρές ζώνες 4.2.3. Παλιρροιακά επ ίπ εδα 5.2.1. Παράκτιες λιµνοθάλασσες 5.2.2. Εκβολέ ς ποταµών Excerpt of the Integrated Concept Lattice (project 2) Κατηγ ορίες παλαιότερου σχήµατος ταξινόµησης ΕΣΥΕ Τελ ικές κατηγορίες σχήµατος ταξινό µησης χρήσης/κάλυψης γης CORINE Land Cover α: καλλιεργούµενες εκτάσεις και αγραναπαύσεις από 1-5 ετών β: βοσκότοποι (δηµοτικοί ή κοινοτικοί και άλλοι) γ: δάση ΓΕΩΡΓΙΚΕΣ ΠΕΡΙΟΧΕΣ ΑΣΗ & ΗΜΙΦΥΣΙΚΕΣ ΕΚΤΑΣΕΙΣ Αρόσιµη γη Μόνιµες καλλιέργειες Βοσκοτόπια & λιβαδια Ετερογενείς γεωργικές περιοχές Συνδυασµοί θαµνώδους και/ή ποώδους βλάστησης Μεταβατικέςδασώδειςθαµνώδεις εκτάσεις άση 2.1.1. 2.1.2. Μη-αρδεύσιµη Μόνιµα αρόσιµη γη αρδευόµενη γη 2.1.3. Ορυζώνες 2.2.1. Αµπελώνες 2.2.2. Οπ ωροφόρα δέντρα & φυτείες µε σαρκώδεις καρπούς 2.2.3. Ελαιώνες 2.3.1. Βοσκοτόπια & λιβαδια 3.2.1. Φυσικοί βοσκότοποι 2.4.1. 2.4.2. 2.4.3. 2.4.4. Ετήσιες καλλιέργ. Σύνθετα συστήµατα Γη κυρίωςγεωργική Γεωργοδασικές µε µόνιµες καλλιέργ. καλλιέργειας µε σηµαντικές εκτάσεις περιοχές φυσικής βλάστησης 3.2.2. Θάµνοι & χερσότοποι 3.2.3. Σκληροφυλλική βλάστηση 3.2.4. Μεταβατικέςδασώδ εις - θαµνώδεις εκτάσεις 3.1.1. άσος πλατύφυλλων 3.1.2. άσος κωνοφόρων 3.1.3. Μικτό δάσος 27 Fission «vertical» and «horizontal» integration vertical: level of detail horizontal: context (conceptualization, domain, application, etc.) Level of detail Context Classes are defined only by level of detail and context. Other parameters (e.g., spatial characteristics) are not dealt with. (e.g., building and building block) 28 14
Fission Given a scale and a context, the CL makes it possible to determine the appropriate «band» and derive the classes to be used. Different levels of detail correspond to «horizontal lines» (or «bands») in the CL. Different contexts correspond to «vertical lines» in the CL. 29 Schema fission: Different levels of detail 30 15
Continuous Single-family hou s e Urban fabric Apartmen t building Di s co ntinuo us Processing Blas t plant furnace Processing indus try Settling basin Oil/g as fa cilities Manu fac tu rin g Fabrication industry Works Associated indus trial structures Secondary s ector Intrse ctioninterchange Bankofficeenterprise Ho telre s taurant Energ y Se rvice Health Social care Industrial, commercial, transpo rt and socio-economic units Education science Commerce Culture Social serviceadministration Archaeologicalhis torical site Cultural site Religious site Tertiary sector Stadium Sportleisure Sport Swimming pool Teleco mmunication Cinema theatre Le is ure exhibition ground Defence Street Na tio na l road Provinc ial road Pede strian Rural and precinct fores t road Road network Trail Railway Te rmin al Road and rail networks and as sociate d land Trans portatio n Port are as Inte rs e ctio n- Parking level cro ssing Interchan ge Airport Airdrome Airfield He lipo rt Mine-pit Min e Mineral extraction sites Saltern Pit Quarry Well Mine, dump and construction sites Disposal site Dump sites Wrec king yard Construction sites Burner Green urban areas Artificial, non-agricultural vegetated areas Sport and leisure facilities Park Cemetery Sport Le is u re At hle tic field Race track Tennis court Golf course Campground Amus eme nt park Schema fission: Different Contexts a an context excerpt 31 Generalization The structure of the CL enables links between similar classes at different levels of detail Dynamic generalization of geographic entities: transfer from one level of detail to the other, continuous on-the-fly generalization on the screen depending on the zoom factor Generalization through time: links correspond to the evolution of classes through time 32 16
1 Km 17
18
Dynamic model generalization process Transition to different levels of detail and different classification schemata by changing the level of detail and the context. 1:50.000 1:100.000 1:20.000 1:10.000 1:5000 37 Conclusion development of a multi-scale, multi-context ontology for: fusion fission generalization revelation of implicit relationships between concepts derivation of new classes from the fusion or division of originally overlapping ones (increase semantic completeness) preservation of original ontologies 38 19
Conclusion the CL incorporates different complementary conceptualizations, each suitable for some context and level of detail fission: selection of appropriate categories according to the context and level of detail of specific applications facilitates information reuse cognition should not be ignored in the integration of different ontologies - embody theories of human categorization 39 20