Department of Computer Science University of Cyprus EPL446 – Advanced Database Systems
Lecture 19 Introduction to Distributed Databases Chapter 25.1-25.4: Elmasri & Navathe, 5ED Chapter 22.6-22-10: Ramakrishnan & Gehrke, 3ED
Demetris Zeinalipour http://www.cs.ucy.ac.cy/~dzeina/courses/epl446 EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
19-1
Lecture Outline (Introduction to Distributed Databases) • • • •
• •
Introduction to Distributed Databases Types of Distributed Databases – Homogeneous, Heterogeneous (Federated, MultiDBs) Distributed Databases Architectures – Client Server, Collaboration Server, Middleware Data Fragmentation & Replication – Horizontal, Vertical and Mixed Fragmentation. – Synchronous vs. Asynchronous Distributed Catalog Management (next lecture) Distributed Query Processing (next lecture) –
Centralized, Ship-to-one-site, Semi-join, Bloom-join & Bloom Filters EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
19-2
Introduction to Distributed Databases (Διζαγωγή ζε Καηανεμημένερ Βάζειρ) • Distributed Database (DDB) – a collection of multiple logically related (λογικά ζςζσεηιζόμενερ) databases distributed over a computer network.
• Distributed Database Management System (DDBMS) – a generic software system that manages a distributed database while making the distribution transparent (διαθανήρ) to the user.
• Distributed Databases – Reality (e.g., WWW, Grids, Cloud, Sensors, Mobiles, …)
• Distributed Database Management Systems – A myth? … see next slide for details EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
19-3
Introduction to DDBMS (Διζαγωγή ζε Καηανεμημένα Σςζ. Γιασ. Γεδομένων)
• The Problem – There is no real consensus (πλειοτηθία γνώμηρ) on what the design objectives of DDBMS should be. – The field is evolving (εξελίζζεηαι) mostly in response to user needs rather than generic principles.
• Some Facts – DDBMS come at a significant cost in terms of performance (επίδοζη), software complexity (Πολςπλοκόηηηα λογιζμικού) and administration difficulty (δςζκολία διασείπιζηρ). – A full scale comprehensive DDBMS that implements an open standard for distributed databases never emerged as a commercially viable product; •
All DBMS vendors (e.g. SQL Server, Oracle, DB2) provide means to 19-4 distribute but there is a- Demetris lack of Zeinalipour an open architecture EPL446: Advanceddata Database Systems (University of Cyprus)
Introduction to Distributed Databases (Διζαγωγή ζε Καηανεμημένερ Βάζειρ) • A Centralized database with distributed clients
• A Truly Distributed Database
EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
19-5
Introduction to DDBMS (Διζαγωγή ζε Καηανεμημένα Σςζ. Γιασ. Γεδομένων)
• The following properties are desirable: – Distributed Data Independence (Καηανεμημένη Ανεξαπηηζία Δεδομένων): Users should not have to know where data is located. • Extends Physical and Logical Data Independence principles. • In particular, the user is shielded from the details of how data is “stored” (e.g., sorted, not sorted) and “logically organized” (e.g., in one or more relations)
– Distributed Transaction Atomicity (Καηανεμημένη Αηομικόηηηα Δοζοληψιών): Users should be able to write Transactions accessing multiple sites just like local Transactions. EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
19-6
Types of Distributed Databases (Τύποι Καηανεμημένων Βάζεων) Homogeneous (Ομογενήρ): Every site runs same type of DBMS – All sites of the database system have identical setup, i.e., same database system software. – The underlying operating system might be different.
Windows Site 5
Oracle Windows Site 4
• For example, ALL sites run Oracle or DB2, or Sybase or some other database Oracle system.
– The underlying operating systems CAN be a mixture of Linux, Window, Unix, etc.
Unix Oracle Site 1
Communications network
Site 2 Site 3 Linux Oracle Linux
Oracle
EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
19-7
Types of Distributed Databases (Τύποι Καηανεμημένων Βάζεων) Heterogeneous (Εηεπογενήρ): Different sites run different DBMSs (even non-relational DBMSs). • Types of Heterogeneous Databases – Federated (Single Schema): Each site may run different database system but the data access is managed through a single conceptual schema. • This implies that the degree of local autonomy is minimum. • Each site must adhere to a centralized access policy.
– Multidatabase (No Schema): There is no one conceptual global schema. For data access a schema is constructed dynamically as needed by the application software. Unix Relational Site 5 Unix Site 1 Hierarchical Window Site 4
Object Oriented
Communications network Network DBMS Site 3 Linux
Site 2 Linux
Relational
In Heterogeneous Databases, gateway protocols (e.g., ODBC, JDBC) are necessary. These protocols add to the cost of accessing the DB.
EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus) Multidatabase Example
19-8
Distributed Databases Architectures (Απσιηεκηονικέρ Καηανεμημένων Βάζεων) A. Client-Server (Πελάηη-Εξςπηπεηηηή): – Query can span one or more sites. – All query processing at server. • Clients are “thin” (i.e., application logic implemented at the server) • Set-oriented communication (tuple-at-a-time communication is expensive). • Clients perform caching of results to minimize communication
– Drawback: Not Scalable (to combine data from multiple sources requires that the client implements all the application logic locally) QUERY
CLIENT
SERVER
CLIENT
SERVER
SERVER
EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
19-9
Distributed Databases Architectures (Απσιηεκηονικέρ Καηανεμημένων Βάζεων) B. Collaborating-Server (Σςνεπγαζόμενος Εξςπηπεηηηή): – Client ships query to a collaborating server which takes care of: • Optimizing the query and sending it to N sites • Collecting/Caching the results. • Returning the results to the user – If the collaborating server is implemented as part of the client software stack then it is also called “Middleware” QUERY
CLIENT
CLIENT Coll. Serv
SERVER
CLIENT
SERVER
SERVER
EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
19-10
Data Fragmentation (Καηάημηζη Γεδομένων) • Horizontal Fragmentation (Οπιζόνηια Καηάημηζη) – Create tuple subsets of relations and assign each subset to a distributed site (or relation might physically be fragmented). – Fragments are usually required to be disjoint (S1∩S2=∅) – Union of fragments must be equal to the initial relation.
• Vertical Fragmentation (Κάθεηη Καηάημηζη) – create subset of columns of a relation and assign each subset to a distributed site (or relation might physically be fragmented). – Collection of fragments should be a lossless-join decomposition of the original relation (in other words we can recover the initial relation)
• Mixed Fragmentation (Μεικηή Καηάημηζη) – A combination of Vertical and Horizontal fragmentation. – This is achieved by SELECT-PROJECT operations which is represented by Li(sCi (R)). EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
19-11
Data Fragmentation (Καηάημηζη Γεδομένων) • Horizontal fragmentation (Οπιζόνηια Καηάημηζη) – It is a horizontal subset of a relation which contain those of tuples which satisfy selection conditions. – Consider the Employee relation with selection condition (DNO = 5). • All tuples satisfy this condition will create a subset which will be a horizontal fragment of Employee relation.
– A selection condition may be composed of several conditions connected by AND or OR.
EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
19-12
Data Fragmentation (Καηάημηζη Γεδομένων) • Vertical fragmentation (Κάθεηη Καηάημηζη) – It is a subset of a relation which is created by a subset of columns. • Α vertical fragment of a relation will contain values of selected columns. • There is no selection condition used in vertical fragmentation.
– Consider the Employee relation. • A vertical fragment of can be created by keeping the values of Name, Bdate, Sex, and Address.
– Because there is no condition for creating a vertical fragment, each fragment must include the primary key attribute of the parent relation Employee. • In this way all vertical fragments of a relation are connected. 19-13
EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Data Replication (Ανηίγπαθα Γεδομένων) • Data Replication (Ανηιγπαθή Δεδομένυν) – Store copies of a data at multiple sites to minimize access time (σπόνορ πποζπέλαζηρ) and increase availability of data (διαθεζιμόηηηα)
– Full Replication: the entire database is replicated – Partial Replication: some selected part is replicated to some of the sites. – Data replication is achieved through a replication schema, a definition of: • all attributes and tuples in the DB; and • a sequence of UNION operations from which the initial database can be reconstructed.
– Replication Strategies: Synchronous (online) vs. Asynchronous (offline)
EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
19-14
Data Replication (Ανηίγπαθα Γεδομένων) • Asynchronous Replication: Copies of a modified relation are only periodically updated; – different copies may get out of synch in the meantime. – Current products follow this approach, e.g., Oracle Streams built-in feature which enables the propagation of data (DML and DDL updates), transactions and events in a data stream either within a database, or from one database to another.
• Synchronous Replication: All copies of a modified relation (fragment) must be updated before the modifying Xact commits. – Oracle Streams supports this feature but it is efficient 19-15 only when changes affect a small number of tables. EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Fragmentation and Replication (Καηάημηζη και Ανηίγπαθα) • The EMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented horizontally and stored with possible replication as shown below.
Fragmentation / Replication
fragmentation
EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
19-16