Principles of Database Systems

Σχετικά έγγραφα
Principles of Database Systems

ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ

Overview. Transition Semantics. Configurations and the transition relation. Executions and computation

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ - ΤΜΗΥΠ ΒΑΣΕΙΣ ΔΕΔΟΜΕΝΩΝ Ι

Lecture 2. Soundness and completeness of propositional logic

Εργαστήριο Ανάπτυξης Εφαρμογών Βάσεων Δεδομένων. Εξάμηνο 7 ο

The Simply Typed Lambda Calculus

C.S. 430 Assignment 6, Sample Solutions

EE512: Error Control Coding

Instruction Execution Times

2 Composition. Invertible Mappings

Other Test Constructions: Likelihood Ratio & Bayes Tests

Example Sheet 3 Solutions

Every set of first-order formulas is equivalent to an independent set

Στο εστιατόριο «ToDokimasesPrinToBgaleisStonKosmo?» έξω από τους δακτυλίους του Κρόνου, οι παραγγελίες γίνονται ηλεκτρονικά.

The challenges of non-stable predicates

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

ΣΧΕ ΙΑΣΜΟΣ ΒΑΣΕΩΝ Ε ΟΜΕΝΩΝ ΚΑΙ ΚΑΝΟΝΙΚΟΠΟΙΗΣΗ

Math 6 SL Probability Distributions Practice Test Mark Scheme

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Lecture 24: Functional Dependencies and Normalization IV

Statistical Inference I Locally most powerful tests

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Inverse trigonometric functions & General Solution of Trigonometric Equations

Matrices and Determinants

Approximation of distance between locations on earth given by latitude and longitude

Διδάσκων: Παναγιώτης Ανδρέου

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Μηχανική Μάθηση Hypothesis Testing

Finite Field Problems: Solutions

Physical DB Design. B-Trees Index files can become quite large for large main files Indices on index files are possible.

Section 8.3 Trigonometric Equations

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Numerical Analysis FMN011

Lecture 23: Functional Dependencies and Normalization

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

TMA4115 Matematikk 3

derivation of the Laplacian from rectangular to spherical coordinates

Lecture 26: Circular domains

ST5224: Advanced Statistical Theory II

Fractional Colorings and Zykov Products of graphs

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Block Ciphers Modes. Ramki Thurimella

Ακεραιότητα και Ασφάλεια Μέρος 1 Σχεδιασμός Βάσεων Δεδομένων

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

Θεωρία Κανονικοποίησης

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

Διδάσκων: Παναγιώτης Ανδρέου

Εργαστήριο Ανάπτυξης Εφαρμογών Βάσεων Δεδομένων. Εξάμηνο 7 ο

ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ. ΕΠΛ342: Βάσεις Δεδομένων. Χειμερινό Εξάμηνο Φροντιστήριο 10 ΛΥΣΕΙΣ. Επερωτήσεις SQL

ΣΧΕΔΙΑΣΜΟΣ ΒΑΣΕΩΝ ΔΕΔΟΜΕΝΩΝ ΚΑΙ ΚΑΝΟΝΙΚΟΠΟΙΗΣΗ. Σχεδιασμός Σχεσιακών ΒΔ και Κανονικοποίηση 1

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Αλγόριθμοι και πολυπλοκότητα NP-Completeness (2)

4.6 Autoregressive Moving Average Model ARMA(1,1)

PARTIAL NOTES for 6.1 Trigonometric Identities

Problem Set 3: Solutions

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

A Note on Intuitionistic Fuzzy. Equivalence Relation

Homework 8 Model Solution Section

Homework 3 Solutions

The ε-pseudospectrum of a Matrix

CE 530 Molecular Simulation

Chapter 29. Adjectival Participle

( ) 2 and compare to M.

Chap. 6 Pushdown Automata

Concrete Mathematics Exercises from 30 September 2016

ORDINAL ARITHMETIC JULIAN J. SCHLÖDER

Code Breaker. TEACHER s NOTES

Parametrized Surfaces

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

2. Let H 1 and H 2 be Hilbert spaces and let T : H 1 H 2 be a bounded linear operator. Prove that [T (H 1 )] = N (T ). (6p)

Διδάσκων: Παναγιώτης Ανδρέου

Section 9.2 Polar Equations and Graphs

department listing department name αχχουντσ ϕανε βαλικτ δδσϕηασδδη σδηφγ ασκϕηλκ τεχηνιχαλ αλαν ϕουν διξ τεχηνιχαλ ϕοην µαριανι

Εισαγωγή στις Βάσεις Δεδομζνων II

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Congruence Classes of Invertible Matrices of Order 3 over F 2

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

Modbus basic setup notes for IO-Link AL1xxx Master Block

Συστήματα Διαχείρισης Βάσεων Δεδομένων

Srednicki Chapter 55

Section 7.6 Double and Half Angle Formulas

Assalamu `alaikum wr. wb.

Models for Probabilistic Programs with an Adversary

Bounding Nonsplitting Enumeration Degrees

Reminders: linear functions

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής

Lecture 13 - Root Space Decomposition II

Cyclic or elementary abelian Covers of K 4

LTL to Buchi. Overview. Buchi Model Checking LTL Translating LTL into Buchi. Ralf Huuck. Buchi Automata. Example

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Κανονικοποίηση(Normalization) ΒΑΣΕΙΣ ΔΕΔΟΜΕΝΩΝ 9 ο Εξάμηνο2013. Κανονικές Μορφές. Πρώτη Κανονική Μορφή (1NF) Βάσεις Δεδομένων. Περικλής Α.

ΕΠΙΧΕΙΡΗΣΙΑΚΗ ΑΛΛΗΛΟΓΡΑΦΙΑ ΚΑΙ ΕΠΙΚΟΙΝΩΝΙΑ ΣΤΗΝ ΑΓΓΛΙΚΗ ΓΛΩΣΣΑ

Transcript:

Principles of Database Systems V. Megalooikonomou Database Design and Normalization (Σχεδιασμός Βάσεων Δεδομένων και Κανονικοποίηση) (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos)

Overview Relational model formal query languages commercial query languages (SQL) Integrity constraints domain I.C., foreign keys functional dependencies Functional Dependencies DB design and normalization

Overview - detailed DB design and normalization pitfalls of bad design decomposition normal forms (κανονικές μορφές)

Goal Design good tables sub-goal#1: define what good means sub-goal#2: fix bad tables in short: we want tables where the attributes depend on the primary key, on the whole key, and nothing but the key Let s see why, and how:

Pitfalls takes1 (ssn, c-id, grade, name, address) Ssn c-id Grade Name Address 123 cs331 A smith Main

Pitfalls Bad - why? because: ssn->address, name Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 123 cs211 A smith Main

Pitfalls Redundancy space (inconsistencies) insertion/deletion anomalies:

Pitfalls insertion anomaly: jones registers, but takes no class - no place to store his address! Ssn c-id Grade Name Address 123 cs331a smith Main 234 null null jones Forbes

Pitfalls deletion anomaly: delete the last record of smith (we lose his address!) Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 123 cs211 A smith Main

Solution: decomposition split offending table in two (or more), e.g.: Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 123 cs211 A smith Main??

Overview - detailed DB design and normalization pitfalls of bad design decomposition lossless join dependency preserving normal forms

Decompositions there are bad decompositions we want: lossless and dependency preserving

Decompositions - lossy: R1(ssn, grade, name, address) R2(c-id,grade) Ssn Grade Name Address 123 A smith Main 123 B smith Main 234 A jones Forbes c-id cs331 A cs351 B cs211 A Grade Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 234 cs211 A jones Forbes ssn->name, address ssn, c-id -> grade

Decompositions - lossy: can not recover original table with a join! Ssn Grade Name Address 123 A smith Main 123 B smith Main 234 A jones Forbes c-id cs331 A cs351 B cs211 A Grade Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 234 cs211 A jones Forbes ssn->name, address ssn, c-id -> grade

Decompositions lossy: Another example Decomposition of R = (A, B) into R 1 = (A), R 2 = (B) A B α α β 1 2 1 A α β B 1 2 r A (r) B (r) A (r) B (r) A B α α β β 1 2 1 2

Decompositions example of non-dependency preserving S# address status 123 London E 125 Paris E 234 Pitts. A S# address 123 London 125 Paris 234 Pitts. S# status 123 E 125 E 234 A S# -> address, status S# -> address S# -> status address -> status

Decompositions is it lossless? S# address status 123 London E 125 Paris E 234 Pitts. A S# address 123 London 125 Paris 234 Pitts. S# status 123 E 125 E 234 A S# -> address, status address -> status S# -> address S# -> status

Decompositions - lossless Definition: Consider schema R, with FD F. R1, R2 is a lossless join decomposition of R if we always have: r 1 r2 = r An easier criterion?

Decomposition - lossless Theorem: lossless join decomposition if the joining attribute is a superkey in at least one of the new tables Formally: R1 R2 R1 or R1 R2 R2

Decomposition - lossless example: R1 Ssn c-id Grade 123 cs331 A 123 cs351 B R2 Ssn Name Address 123 smith Main 234 jones Forbes 234 cs211 A ssn, c-id -> grade ssn->name, address Ssn c-id Grade Name Address 123 cs331a smith Main 123 cs351b smith Main 234 cs211a jones Forbes ssn->name, address ssn, c-id -> grade

Overview - detailed DB design and normalization pitfalls of bad design decomposition lossless join decomp. dependency preserving normal forms

Decomposition - depend. pres. informally: we don t want the original FDs to span two tables - counter-example: S# address status 123 London E 125 Paris E 234 Pitts. A S# address 123 London 125 Paris 234 Pitts. S# status 123 E 125 E 234 A S# -> address, status S# -> address S# -> status address -> status

Decomposition - depend. pres. dependency preserving decomposition: S# address status 123 London E 125 Paris E 234 Pitts. A S# address 123 London 125 Paris 234 Pitts. address status London E Paris E Pitts. A S# -> address, status address -> status S# -> address address -> status (but: S#->status?)

Decomposition - depend. pres. informally: we don t want the original FDs to span two tables more specifically: the FDs of the canonical cover Let F i be the set of dependencies F + that include only attributes in R i. Preferably the decomposition should be dependency preserving, that is, (F 1 F 2 F n ) + = F + Otherwise, checking updates for violation of functional dependencies may require computing joins expensive

Decomposition - depend. pres. why is dependency preservation good? S# address 123 London 125 Paris 234 Pitts. S# status 123 E 125 E 234 A S# address 123 London 125 Paris 234 Pitts. address status London E Paris E Pitts. A S# -> address S# -> status S# -> address address -> status (address->status: lost )

Decomposition - depend. pres. A: eg., record that Philly has status A S# address 123 London 125 Paris 234 Pitts. S# status 123 E 125 E 234 A S# address 123 London 125 Paris 234 Pitts. address status London E Paris E Pitts. A S# -> address S# -> status S# -> address address -> status (address->status: lost )

Decomposition - depend. pres. To check if a dependency α β is preserved in a decomposition of R into R 1, R 2,, R n we apply the following test (with attribute closure done w.r.t. F) result = α while (changes to result) do for each R i in the decomposition t = (result R i ) + R i result = result t If result contains all attributes in β, then functional dependency α β is preserved We apply the test on all dependencies in F to check if a decomposition is dependency preserving The test takes polynomial time Computing F + and (F 1 F 2 F n ) + needs exponential time

Decomposition - conclusions decompositions should always be lossless joining attribute -> superkey whenever possible, we want them to be dependency preserving (occasionally, impossible - see STJ example later )

Normalization using FD When decomposing a relation schema R with a set of functional dependencies F into R 1, R 2,, R n we want: Lossless-join decomposition: otherwise information loss No redundancy: relations R i preferably should be in either Boyce-Codd Normal Form or Third Normal Form Dependency preservation: Let F i be the set of dependencies in F + that include only attributes in R i. Preferably the decomposition should be dependency preserving, i.e., (F 1 F 2 F n ) + = F + Otherwise, checking updates for violation of functional dependencies may require computing joins expensive

Normalization using FD - Example R = (A, B, C) F = {A B, B C) R 1 = (A, B), R 2 = (B, C) Lossless-join decomposition: R 1 R 2 = {B} and B BC Dependency preserving R 1 = (A, B), R 2 = (A, C) Lossless-join decomposition: R 1 R 2 = {A} and A AB Not dependency preserving (cannot check B C without computing R 1 R 2 )

Overview - detailed DB design and normalization pitfalls of bad design decomposition ( how to fix the problem) normal forms ( how to detect the problem) BCNF, 3NF, (1NF, 2NF)

Normal forms - BCNF We saw how to fix bad schemas - but what is a good schema? Answer: good, if it obeys a normal form, i.e., a set of rules Typically: Boyce-Codd Normal Form (BCNF)

Normal forms - BCNF Defn.: Rel. R is in BCNF w.r.t. F, if informally: everything depends on the full key, and nothing but the key semi-formally: every determinant (of the cover) is a candidate key

Normal forms - BCNF Example and counter-example: Ssn Name Address 123 smith Main 123 smith Main 234 jones Forbes ssn->name, address Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 234 cs211 A jones Forbes ssn->name, address ssn, c-id -> grade

Normal forms - BCNF Formally: for every FD a->b in F+ a->b is trivial (a is a superset of b) or a is a superkey (or both)

Normal forms - BCNF Theorem: given a schema R and a set of FD F, we can always decompose it to schemas R1, Rn, so that R1, Rn are in BCNF and the decomposition is lossless ( but, some decomp. might lose dependencies)

BCNF Decomposition How?.essentially, break off FDs of the cover eg. TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade

Normal forms - BCNF eg. TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade grade ssn c-id name address

Normal forms - BCNF Ssn c-id Grade 123 cs331 A 123 cs351 B 234 cs211 A ssn, c-id -> grade Ssn Name Address 123 smith Main 123 smith Main 234 jones Forbes ssn->name, address Ssn c-id Grade Name Address 123 cs331a smith Main 123 cs351b smith Main 234 cs211a jones Forbes ssn->name, address ssn, c-id -> grade

Normal forms - BCNF pictorially: we want a star shape grade ssn c-id name address :not in BCNF

Normal forms - BCNF pictorially: we want a star shape F A B or D G C E H

Normal forms - BCNF or a star-like: (e.g., 2 cand. keys): STUDENT(ssn, st#, name, address) ssn name address = ssn name address st# st#

Normal forms - BCNF but not: B F A or D G D E C H

BCNF Decomposition result := {R}; done := false; compute F + ; while (not done) do if (there is a schema R i in result that is not in BCNF) then begin let α β be a nontrivial functional dependency that holds on R i such that α R i is not in F +, and α β = ; result := (result R i ) (R i β) (α, β ); end else done := true; Note: each R i is in BCNF, and decomposition is lossless-join

Normal forms - 3NF consider the classic case: STJ( Student, Teacher, subject) T-> J S,J -> T is it BCNF? S J T

Normal forms - 3NF STJ( Student, Teacher, subject) T-> J S,J -> T How to decompose it to BCNF? S J T

Normal forms - 3NF STJ( Student, Teacher, subject) T-> J S,J -> T 1) R1(T,J) R2(S,J) (BCNF? - lossless? - dep. pres.? ) 2) R1(T,J) R2(S,T) (BCNF? - lossless? - dep. pres.? )

Normal forms - 3NF STJ( Student, Teacher, subject) T-> J S,J -> T 1) R1(T,J) R2(S,J) (BCNF? Y+Y - lossless? N -dep. pres.? N ) 2) R1(T,J) R2(S,T) (BCNF? Y+Y - lossless? Y - dep. pres.? N )

Normal forms - 3NF STJ( Student, Teacher, subject) T-> J S,J -> T in this case: impossible to have both BCNF and dependency preservation Welcome 3NF ( a weaker normal form)!

Normal forms - 3NF STJ( Student, Teacher, subject) T-> J S,J -> T S J T informally, 3NF forgives the red arrow in the can. cover

Normal forms - 3NF STJ( Student, Teacher, subject) T-> J S,J -> T Formally, a rel. R with FDs F is in 3NF if: for every a->b in F+: S J T it is trivial or a is a superkey or each b-a attr.: part of a cand. key

Normal forms - 3NF R = (J, K, L) F = {JK L, L K} Two candidate keys = JK and JL R is not in BCNF Any decomposition of R will fail to preserve JK L BCNF decomposition has (JL) and (LK) Testing for JK L requires a join R is in 3NF JK L JK is a superkey L K K is contained in a candidate key There is some redundancy in this schema

Normal forms - 3NF TESTING FOR 3NF Optimization: Need to check only FDs in F, need not check all FDs in F + Use attribute closure to check, for each dependency α β, if α is a superkey If α is not a superkey, we have to verify if each attribute in β is contained in a candidate key of R this test is more expensive; it involves finding candidate keys testing for 3NF is NP-hard Interestingly, decomposition into 3NF (described shortly) can be done in polynomial time

Decomposition into 3NF The dependencies are preserved by building explicitly a schema for each given dependency Guarantees a lossless-join decomposition by having at least one schema containing a candidate key for the schema being decomposed Let F c be a canonical cover for F; i := 0; for each functional dependency α β in F c do if none of the schemas R j, 1 j i contains α β then begin i := i + 1; R i := α β end if none of the schemas R j, 1 j i contains a candidate key for R then begin i := i + 1; R i := any candidate key for R; end return (R, R,..., R )

Normal forms - 3NF how to bring a schema to 3NF? In short.for each FD in the cover, put it in a table

Normal forms - 3NF vs BCNF If R is in BCNF, it is always in 3NF (but not the reverse) In practice, aim for BCNF; lossless join; and dep. preservation if impossible, we accept 3NF; but insist on lossless join and dep. preservation 3NF has problems with transitive dependencies

3NF vs BCNF (cont.) Example of problems due to redundancy in 3NF R = (J, K, L) F = {JK L, L K} J L K j 1 j 2 j 3 null l 1 l 1 l 1 l 2 k 1 k 1 k 1 k 2 A schema that is in 3NF but not in BCNF has the problems of: repetition of information (e.g., the relationship l 1, k 1 ) need to use null values (e.g., to represent the relationship l 2, k 2 where there is no corresponding value for J).

Normal forms - more details why 3 NF? what is 2NF? 1NF? 1NF: attributes are atomic (i.e., no set-valued attr., a.k.a. repeating groups ) 1ΝF: όχι σχέσεις μέσα σε σχέσεις ή σχέσεις ως γνωρίσματα πλειάδων Ssn Name Dependents 123 Smith Peter Mary John 234 Jones Ann Michael not 1NF

Normal forms - more details 2NF: 1NF and non-key attr. fully depend on the key : αν κάθε μη πρωτεύον γνώρισμα είναι πλήρως συναρτησιακά εξαρτώμενο από το πρωτεύον κλειδί counter-example: TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade grade ssn c-id name address

Normal forms - more details 3NF: 2NF and no transitive dependencies : 2NF και κανένα μη πρωτεύον γνώρισμα δεν εξαρτάται μεταβατικά από το πρωτεύον κλειδί counter-example: A D B in 2NF, but not in 3NF C

Normal forms - more details 4NF, multivalued dependencies etc: later in practice, E-R diagrams usually lead to tables in BCNF

Overview - conclusions DB design and normalization pitfalls of bad design decompositions (lossless, dep. preserving) normal forms (BCNF or 3NF) everything should depend on the key, the whole key, and nothing but the key