Data Center Disaster Recovery - Cisco

©2006 Cisco Systems, Inc. All rights reserved. 2 Agenda Data Center—The Evolution Data Center Disaster Recovery Objectives Failure Scenarios Design Op...

333 downloads 767 Views 3MB Size
Data Center Disaster Recovery

KwaiSeng Consulting Systems Engineer

Presentation_ID

© 2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

1

Agenda  Data Center—The Evolution  Data Center Disaster Recovery Objectives Failure Scenarios Design Options

 Components of Disaster Recovery Site Selection—Front End GSLB Server High Availability—Clustering Data Replication and Synchronization—SAN Extension

 Data Center Technology Trends  Summary © 2006 Cisco Systems, Inc. All rights reserved.

2

The Evolution of Data Centers

© 2006 Cisco Systems, Inc. All rights reserved.

3

Data Center Evolution Networked Data Center Phase

Business Agility

Data Center Continuous Data Center Availability Virtualization

Compute Evolution

Internet Computing

Data Center Consolidation Network Optimization

Data Center Networking

Client/ Server Mainframes Content Networking

1. Consolidation 2. Integration 3. Virtualization 4. High Availability

Thin Client: HTTP TCP/IP Terminal

1960

1980 © 2006 Cisco Systems, Inc. All rights reserved.

2000

Network Evolution 2010 4

Today’s Data Center Integration of Many Systems and Services N-Tier Applications

Storage Network

Front End Network Application/Server Optimization Security

Web Servers

WAN/ Internet

Cache Resilient IP Firewall

FC Switch

DR Data Center Scalable Infrastructure

NAS

Application and Server Optimization App Servers IDS

Content Switch

VSANs FC Switch

Data Center Security MAN/ Internet

DC Storage Networks Distributed Data Centers

DB Servers Mainframe

IP Comm.

Operations FC Switch

RAID

Tape

Metro Network DWDM/SONET/Ethernet

FC SAN © 2006 Cisco Systems, Inc. All rights reserved.

Secondary Data Center

5

What Is Distributed Data Center?

App A

App B

App A

App C

Data Replication FC

FC

Primary Data Center

© 2006 Cisco Systems, Inc. All rights reserved.

Secondary Data Center 6

Distributed Data Centers  Required by disaster recovery and business continuance  Avoid single, concentrated data depositary  High availability of applications and data access  Load balancing together with performance scalability  Better response and optimal content routing: proximity to clients

© 2006 Cisco Systems, Inc. All rights reserved.

7

Front-End IP Access Layer

“Content Routing” Site Selection App A

App B

App A

FC

App C

FC

Primary Data Center

© 2006 Cisco Systems, Inc. All rights reserved.

Secondary Data Center 8

Application and Database Layer

App A

App B

“Content Switching” Load Balancing “Server Clustering” High Availability

App A

FC

App C

FC

Primary Data Center

© 2006 Cisco Systems, Inc. All rights reserved.

Secondary Data Center 9

Backend SAN Extension

App A

App B

“Storage” and “Optical” Data Replication and Transporting

App A

FC

App C

FC

Primary Data Center

© 2006 Cisco Systems, Inc. All rights reserved.

Secondary Data Center 10

Data Center Disaster Recovery

© 2006 Cisco Systems, Inc. All rights reserved.

11

Agenda  Introduction to Data Center—The Evolution  Data Center Disaster Recovery Objectives Failure Scenarios Design Options

 Components of Disaster Recovery Site Selection—Front End GSLB Server High Availability—Clustering Data Replication and Synchronization—San Extension

 Data Center Technology Trends  Summary © 2006 Cisco Systems, Inc. All rights reserved.

12

Disaster Recovery  Recovery of data and resumption of service—Ensuring business can recover and continue after failure or disaster  Ability of a business to adapt, change and continue when confronted with various outside impacts  Mitigating the impact of a disaster

© 2006 Cisco Systems, Inc. All rights reserved.

13

Disaster Recovery What It Means for Business Business Resilience Continued Operation of Business During a Failure

Business Continuance Restoration of Business After a Failure

Disaster Recovery Protecting Data Through Offsite Data Replication and Backup

© 2006 Cisco Systems, Inc. All rights reserved.

Zero Down Time Is the Ultimate Goal 14

Disaster Recovery Planning  Business Impact Analysis (BIA) Determines the impacts of various disasters to specific business functions and company assets

 Risk analysis Identifies important functions and assets that are critical to company’s operations

 Disaster Recovery Plan (DRP) Restores operability of the target systems, applications, or computing facility at the secondary data center after the disaster

© 2006 Cisco Systems, Inc. All rights reserved.

15

Disaster Recovery Objectives  Recovery Point Objective (RPO) The point in time (prior to the outage) in which system and data must be restored to Tolerable lost of data in event of disaster or failure The impact of data loss and the cost associated with the loss

 Recovery Time Objective (RTO) The period of time after an outage in which the systems and data must be restored to the predetermined RPO The maximum tolerable outage time

© 2006 Cisco Systems, Inc. All rights reserved.

16

Recovery Point/Time vs. Cost Critical Data Is Recovered

Systems Recovered and Operational

Disaster Strikes

Time Recovery Point time t0 Days

Tape backup

Recovery Time Time t1

Hours

Mins

Secs

Time t2 Secs Mins

Periodic Asynchronous Synchronous Extended Replication Replication Replication Cluster

$$$ Increasing Cost

 Smaller RPO/RTO Higher $$$, replication, hot standby © 2006 Cisco Systems, Inc. All rights reserved.

Hours Days

Weeks

Manual Migration

Tape Restore

$$$ Increasing Cost

 Larger RPO/RTO Lower $$$, tape backup/restore, cold standby 17

Agenda  Introduction to Data Center—The Evolution  Data Center Disaster Recovery Objectives Failure Scenarios Design Options

 Components of Disaster Recovery Site Selection—Front End GSLB Server High Availability—Clustering Data Replication and Synchronization—San Extension

 Data Center Technology Trends  Summary © 2006 Cisco Systems, Inc. All rights reserved.

18

Failure Scenarios Disaster Could Mean Many Types of Failure  Network failure  Device failure  Storage failure  Site failure

© 2006 Cisco Systems, Inc. All rights reserved.

19

Network Failures  ISP failure  Dual ISP connections

Service Provider A

Internet

Service Provider B

 Multiple ISP

 Connection failure within the network  EtherChannel®  Multiple route paths

© 2006 Cisco Systems, Inc. All rights reserved.

20

Device Failures  Routers, switches, FWs

Service Provider A

Internet

Service Provider B

 HSRP  VRRP

 Hosts  HA cluster  LB server farm  NIC teaming

© 2006 Cisco Systems, Inc. All rights reserved.

21

Storage Failures  Disk arrays  RAID

Service Provider A

Internet

Service Provider B

 Disk controllers  Storage Replication  Site to Site Mirroring  Optimization

© 2006 Cisco Systems, Inc. All rights reserved.

22

Site Failures  Partial site failure  Application maintenance

Service Provider A

Internet

Service Provider B

 Application migration  Application scheduled DR exercise

 Complete site failure  Disaster

© 2006 Cisco Systems, Inc. All rights reserved.

23

Agenda  Introduction to Data Center—The Evolution  Data Center Disaster Recovery Objectives Failure Scenarios Design Options

 Components of Disaster Recovery Site Selection—Front End GSLB Server High Availability—Clustering Data Replication and Synchronization—San Extension

 Data Center Technology Trends  Summary © 2006 Cisco Systems, Inc. All rights reserved.

24

Warm Standby  A data center that is equipped with hardware and communications interfaces capable of providing backup operating support  Latest backups from the production data center must be delivered  Network access needs to be activated  Application needs to be manually started

© 2006 Cisco Systems, Inc. All rights reserved.

25

Disaster Recovery—Active/Standby

App A

App B

App A

App C

IP/Optical Network FC

Primary Data Center

© 2006 Cisco Systems, Inc. All rights reserved.

Secondary Data Center (Warm Standby)

FC

26

Hot Standby  A data center that is environmentally ready and has sufficient hardware, software to provide data processing service with little down time  Hot backup offers disaster recovery, with little or no human intervention  Application data is replicated from the primary site  A hot backup site provides better RTO/RPO than warm standby but cost more to implement  Business continuance

© 2006 Cisco Systems, Inc. All rights reserved.

27

Disaster Recovery—Active/Standby

App A

App B

App A

App C

IP/Optical Network FC

FC

Primary Data Center © 2006 Cisco Systems, Inc. All rights reserved.

Secondary Data Center 28

Active/Active DR Design Multiple Tiers of Application Service Provider A

Internet

Service Provider B

Presentation Tier Application Tier Storage Tier

© 2006 Cisco Systems, Inc. All rights reserved.

29

Active/Active Data Centers Internal Network

Service Provider A

Internet

Service Provider B

Internal Network

Active/Active Web Hosting Active/Active Application Processing Active/Standby Database Processing or Active/Active for Different Application © 2006 Cisco Systems, Inc. All rights reserved.

30

Components of Disaster Recovery

© 2006 Cisco Systems, Inc. All rights reserved.

31

Agenda  Introduction to Data Center—The Evolution  Data Center Disaster Recovery Objectives Failure Scenarios Design Options

 Components of Disaster Recovery Site Selection—Front End GSLB Server High Availability—Clustering Data Replication and Synchronization—SAN Extension

 Data Center Technology Trends  Summary © 2006 Cisco Systems, Inc. All rights reserved.

32

Site Selection Mechanisms  Site selection mechanisms depend on the technology or mix of technologies adopted for request routing: 1. HTTP redirect 2. DNS-based 3. L3 Routing with Route Health Injection (RHI)

 Health of servers and/or applications needs to be taken into account  Optionally, other metrics (like load) can be measured and utilized for a better selection

© 2006 Cisco Systems, Inc. All rights reserved.

33

HTTP Redirection—Traffic Flow

http://www.cisco.com/ http://www1.cisco.com/

Kee ves pali

1 /1. .com P TT isco ed om H v / c o M co.c ET w w . G 2 1. t: w 30 2.cis 1 . s P/1 www Ho T n: HT 2. atio c Lo 3. GET/H TTP/1.1 Host: ww w2.cisco .co

m

HTTP/1.1 200 OK

http://www2.cisco.com/ © 2006 Cisco Systems, Inc. All rights reserved.

34

DNS-Based Site Selection—Traffic Flow Authoritative Name Server for .com

Root Name Server for/ DNS Proxy 2 3

Authoritative Name Server cisco.com

4 5 6

1 10

8

7

Authoritative Name Server www.cisco.com

http://www.cisco.com/ UDP:53 TCP:80

Data Center 1 © 2006 Cisco Systems, Inc. All rights reserved.

es aliv p e Ke

Ke epa live s

9

Client

Data Center 2 35

Route Health Injection—Implementation Client A

Router 11

Router 13

Client B

Router 10

Low Cost

Router 12

Very High Cost Location A Backup Location for VIP x.y.w.z

© 2006 Cisco Systems, Inc. All rights reserved.

Location B Preferred Location for VIP x.y.w.z

36

Site Selection Summary Redundancy

Convergence

App Health Visibility

Site Persistence

Active/Active

No

No

Yes

DNS

Active/Active

DNS Cache

Yes

No

RHI

Active/Standby

Within Secs

Yes

No

Mode HTTP Re-Direct

© 2006 Cisco Systems, Inc. All rights reserved.

37

Agenda  Introduction to Data Center—The Evolution  Data Center Disaster Recovery Objectives Failure Scenarios Design Options

 Components of Disaster Recovery Site Selection—Front End GSLB Server High Availability—Clustering Data Replication and Synchronization—San Extension

 Data Center Technology Trends  Summary © 2006 Cisco Systems, Inc. All rights reserved.

38

Cluster Overview  Load Balancing Cluster : multiple copies of the same application against the same data set, usually read only  High Availability Cluster : multiple copies of application that requires access to a common data depository, usually read and write  Clustering provides benefits for availability, reliability, scalability, and manageability

© 2006 Cisco Systems, Inc. All rights reserved.

Web Servers

Application Servers

Database Servers

39

High Availability Cluster Design  Public Network : Client /Application requests

 Private Network : Interconnection between nodes

APP Cluster Software Cluster Enabler OS

 Storage Disk : Shared storage array, NAS or SAN © 2006 Cisco Systems, Inc. All rights reserved.

40

HA Cluster Application View  Active/standby Standby takes over when active fails Two-node or multi-node

 Active/active Database requests load balanced all nodes Lock mechanism ensures data integrity

 Shared everything

Node1

Node2

Each node mounts all storage resources Provides a single layout reference system for all nodes

 Shared nothing Each node mounts only its “semi-private” storage Data stored on the peer system’s storage is accessed via the peer-peer communication

© 2006 Cisco Systems, Inc. All rights reserved.

41

Geo-Clusters Considerations Geo-Cluster: Cluster That Span Multiple Data Centers WAN Local

Remote Datacenter

Datacenter

Node2

Node1

 Challenges: Disk Replication Synchronous or Asynchronous 2 x RTT

Split brain L2 heart-beats Storage

© 2006 Cisco Systems, Inc. All rights reserved.

42

HA Cluster Challenges : Split-Brain  Split-brain : Active nodes concurrently accessing the same disk, leads to data corruption Node1

Node2

 Resolution : Use a Quorum, a tie breaker for gaining access to the disk

Data Corruption

© 2006 Cisco Systems, Inc. All rights reserved.

43

Layer 2 Heartbeats  Extended L2 Network : L2 adjacency required for node’s heartbeat. Extending VLAN across site is hazardous Node1

 Resolution : L3 Capability for Cluster Heartbeat. EoMPLS to carry L2 hearbits across DR sites.

© 2006 Cisco Systems, Inc. All rights reserved.

WAN Local Datacenter

Remote Datacenter

Public Layer 2 Network Private Layer 2 Network

Node2

Disk Replication Synchronous or Asynchronous

44

Storage Disk Zoning  Storage Zoning : Taking over of storage disk array when active node fails.

Node1

Node2 Standby

Active

Extended SAN

 Resolution : Cluster software to communicate with the Cluster Enabler. Instructs the Disk Array to perform an failover when failure is detected.

sym1320 RW RW

© 2006 Cisco Systems, Inc. All rights reserved.

sym1291 WD WD 45

Agenda  Introduction to Data Center—The Evolution  Data Center Disaster Recovery Objectives Failure Scenarios Design Options

 Components of Disaster Recovery Site Selection—Front End GSLB Server High Availability—Clustering Data Replication and Synchronization—San Extension

 Data Center Technology Trends  Summary © 2006 Cisco Systems, Inc. All rights reserved.

46

Storage for Applications  Presentation tier Unrelated small data files commonly stored on internal disks Manual distribution

 Application processing tier Transitional, unrelated data Small files residing on file systems May use RAID to spread data over multiple disks

 Storage tier Large, permanent data files or raw data Large batch updates, most likely real time Log and data on separate volumes

© 2006 Cisco Systems, Inc. All rights reserved.

47

Replication: Modes of Operation  Synchronous All data written to local and remote arrays before I/O is complete and acknowledged to host Speed of Light = 3 x 108m/s (Vacuum) ≈ 3.3µs/km Speed through Fiber ≈ ⅔ c ≈ 5µs/km 2 RTT per write I/O = 20µs/km

 Asynchronous Write acknowledged and I/O is complete after write to local array; changes (writes) are replicated to remote array asynchronously

© 2006 Cisco Systems, Inc. All rights reserved.

48

Synchronous vs. Asynchronous TradeOff Enterprises Must Evaluate the Trade-Offs Synchronous

Asynchronous

Impact to Application Performance

No Application Performance Impact

Distance Limited (Are Both Sites Within the Same Threat Radius)

Unlimited Distance (Second Site Outside Threat Radius)

No Data Loss

Exposure to Possible Data Loss

 Maximum tolerable distance ascertained by assessing each application  Cost of data loss © 2006 Cisco Systems, Inc. All rights reserved.

49

Data Replication with DB Example • DB name • Creation date • Backup performed • Redo log time period • Datafile state

Control Files

Identify

 Control files identify other files making up the database and records content and state of the db  Datafile is only updated periodically  Redo logs record db changes resulting from transactions

Datafiles

Record Changes To

• Table spaces • Indexes • Data dictionary © 2006 Cisco Systems, Inc. All rights reserved.

Redo Log Files • Database changes

Used to play back changes that may not have been written to datafile when failure occurred Typically archived as they fill to local and DR site destinations

50

Data Replication with DB Example (Cont.) Time

...

t0

...

...

Archived Redo Logs

Hot Backup of Datafiles and Control Files Taken at Time t0

Online Redo Logs

t1

Failure or Disaster Occurs at Time t1 • Media failure (e.g., disk) • Human error (datafile deletion) • Database corruption

 Database restored to state at time of failure (time t1) by: 1. Restoring control files and datafiles from last hot backup (time t0) 2. Sequentially replaying changes from subsequent redo logs (archived and online)—changes made between time t0 and t1

© 2006 Cisco Systems, Inc. All rights reserved.

51

Data Replication with DB Example (Cont.) Primary Site

Redo Logs (Cyclic) Copy of Every Committed Transaction

Database

Synchronously Replicated for Zero Loss

Secondary Site Earlier DB Backups

SAN Extension Transport

Database Copy at Time t0 Point in Time Copy Taken When DB Quiescent

Redo Logs (Cyclic)

Database Copy at Time t0

Replicated/Copied

Archive Logs

Replicated/Copied

Archive Logs

Mixture of Sync and Async Replication Technologies Commonly Used • Usually only redo logs sync replicated to remote site • Archive logs created from redo log and copied when redo log switches • Point in Time (PiT) copies of datafiles and control files copied periodically (e.g., nightly) © 2006 Cisco Systems, Inc. All rights reserved.

52

Data Center Interconnection Options Internet

High Density Multilayer LAN Switch

Stateful Firewalls

Stateful Firewalls

Content Caching

Content Caching

Server Load Balancing

SONET/SDH

DWDM/ CWDM Back-End Application Servers

IP/Metro E

© 2006 Cisco Systems, Inc. All rights reserved.

High Density Multilayer LAN Switch

Front-End Application Servers

Front-End Application Servers

Enterprise-Class Storage Arrays

Server Load Balancing Intrusion Detection

Intrusion Detection

High Density Multilayer SAN Director

Internet

Back-End Application Servers

Enterprise-Class Storage Arrays

High Density Multilayer SAN Director

53

Data Center Transport Options Increasing Distance Data Center Campus Metro

Optical

Dark Fiber Sync

Regional

National

Limited by Optics (Power Budget)

CWDM Sync (2Gbps)

Limited by Optics (Power Budget)

DWDM Sync (2Gbps Lambda)

Limited by BB_Credits

IP

SONET/SDH Sync (1Gbps+ Subrate) Async MDS9000 FCIP Sync (Metro Eth)

© 2006 Cisco Systems, Inc. All rights reserved.

Async (1Gbps+)

54

DATA CENTER ARCHITECTURE TRENDS

© 2005 Cisco Systems, Inc. All rights reserved. © 2006 Cisco Systems, Inc. All rights reserved.

55 55

Cisco Data Center Vision Server Data Storage Fabric Network Network Network

LAN WAN MAN

SAN

HPC Cluster GRID

Intelligent Information Network

Enterprise Applications

VIRTUALIZATION Management of resources independent of underlying physical infrastructure to increase utilization, efficiency and flexibility

AUTOMATION Dynamic provisioning and autonomic Information Lifecyle Management (ILM) to enable business agility Business Policies On-Demand Service Oriented

Compute

CONSOLIDATION

Network

Centralization and standardization to lower costs, improve efficiency and uptime

Storage

© 2006 Cisco Systems, Inc. All rights reserved.

Compute Network Storage

56

Summary

© 2006 Cisco Systems, Inc. All rights reserved.

57

What we have talk so far?  DR and its Business Objectives Define budget, Technical solution Management Buy In DR is a process

 Components of a Data Center Multi Tier Architecture Front-end, Application, Backend Database

 Techniques in Data Center Disaster Recovery HTML Re-Direction/GSS/RHI Clustering SAN extension

 Trends in Data Center Technology © 2006 Cisco Systems, Inc. All rights reserved.

58

Today’s Data Centers Require an Architectural Approach to…  Protect with Business Resilience Tighten security Improve business continuance

 Optimize with Consolidation Improve operational efficiency and resource utilization Lower complexity and cost of ownership

 Grow towards Services-oriented Infrastructure Align virtualized resources with business demands Automate infrastructure to respond dynamically © 2006 Cisco Systems, Inc. All rights reserved.

59

The Big Picture—The Cisco Data Center The Emerging Data Center Architecture

MAINFRAME CONNECTIVITY

ENTERPRISE TAPE STORAGE

ENTERPRISE DISK STORAGE ENTERPRISE SAN SWITCHING

Virtual Fabrics (VSANs)

MDS 9000 Family

Embedded Intelligent Storage Services

Storage Virtualization Data Replication Svcs

Embedded Intelligent Network Services

Fabric Routing Svcs

Server Balancing

Multiprotocol Gateway Services

VPN Termination

Embedded Intelligent Virtualization Services V

Server Virtualization VFrame

Virtual I/O

SSL Termination

Catalyst 6500 Family

Firewall Services

TOPSPIN FAMILY

Intrusion Detection

Grid/Utility Computing Low Latency RDMA Services Clustering

Server Farm Switching NAS

Enterprise NAS Storage

WIN

ENTERPRISE GRID

UNIX

UNIX/Windows Servers

© 2006 Cisco Systems, Inc. All rights reserved.

SERVER FABRIC SWITCHING

Blade Servers

Virtual Private Server Fabric #1

Virtual Private Virtual Private Blade Server Server Fabric #3 Fabric #2

60

What’s Next?  A Security Strategy to Protect the Data Center Understands the vulnerabilities, and apply the relevant mitigations

 Leverage on Cisco’s Technology to Optimize the Server Resources Reducing TCO for DRs Virtualization to maximize resource invested Grow DC infrastructure, enabling Business Agility Automating computing resources provisioning Speed of deploying new services

© 2006 Cisco Systems, Inc. All rights reserved.

61

Q and A

© 2006 Cisco Systems, Inc. All rights reserved.

62

© 2006 Cisco Systems, Inc. All rights reserved.

63