Data Center Disaster Recovery - Cisco

Data Center Disaster Recovery

KwaiSeng Consulting Systems Engineer

Presentation_ID

© 2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

1

Agenda Data Center—The Evolution Data Center Disaster Recovery Objectives Failure Scenarios Design Options

Components of Disaster Recovery Site Selection—Front End GSLB Server High Availability—Clustering Data Replication and Synchronization—SAN Extension

Data Center Technology Trends Summary © 2006 Cisco Systems, Inc. All rights reserved.

2

The Evolution of Data Centers


3

Data Center Evolution Networked Data Center Phase

Business Agility

Data Center Continuous Data Center Availability Virtualization

Compute Evolution

Internet Computing

Data Center Consolidation Network Optimization

Data Center Networking

Client/ Server Mainframes Content Networking

1. Consolidation 2. Integration 3. Virtualization 4. High Availability

Thin Client: HTTP TCP/IP Terminal

1960

1980 © 2006 Cisco Systems, Inc. All rights reserved.

2000

Network Evolution 2010 4

Today’s Data Center Integration of Many Systems and Services N-Tier Applications

Storage Network

Front End Network Application/Server Optimization Security

Web Servers

WAN/ Internet

Cache Resilient IP Firewall

FC Switch

DR Data Center Scalable Infrastructure

NAS

Application and Server Optimization App Servers IDS

Content Switch

VSANs FC Switch

Data Center Security MAN/ Internet

DC Storage Networks Distributed Data Centers

DB Servers Mainframe

IP Comm.

Operations FC Switch

RAID

Tape

Metro Network DWDM/SONET/Ethernet

FC SAN © 2006 Cisco Systems, Inc. All rights reserved.

Secondary Data Center

5

What Is Distributed Data Center?

App A

App B

App A

App C

Data Replication FC

FC

Primary Data Center


Secondary Data Center 6

Distributed Data Centers Required by disaster recovery and business continuance Avoid single, concentrated data depositary High availability of applications and data access Load balancing together with performance scalability Better response and optimal content routing: proximity to clients


7

Front-End IP Access Layer

“Content Routing” Site Selection App A

App B

App A

FC

App C

FC

Primary Data Center



Application and Database Layer

App A

App B

“Content Switching” Load Balancing “Server Clustering” High Availability

App A

FC

App C

FC

Primary Data Center



Backend SAN Extension

App A

App B

“Storage” and “Optical” Data Replication and Transporting

App A

FC

App C

FC

Primary Data Center



Data Center Disaster Recovery


11

Agenda Introduction to Data Center—The Evolution Data Center Disaster Recovery Objectives Failure Scenarios Design Options

Components of Disaster Recovery Site Selection—Front End GSLB Server High Availability—Clustering Data Replication and Synchronization—San Extension


12

Disaster Recovery Recovery of data and resumption of service—Ensuring business can recover and continue after failure or disaster Ability of a business to adapt, change and continue when confronted with various outside impacts Mitigating the impact of a disaster


13

Disaster Recovery What It Means for Business Business Resilience Continued Operation of Business During a Failure

Business Continuance Restoration of Business After a Failure

Disaster Recovery Protecting Data Through Offsite Data Replication and Backup


Zero Down Time Is the Ultimate Goal 14

Disaster Recovery Planning Business Impact Analysis (BIA) Determines the impacts of various disasters to specific business functions and company assets

Risk analysis Identifies important functions and assets that are critical to company’s operations

Disaster Recovery Plan (DRP) Restores operability of the target systems, applications, or computing facility at the secondary data center after the disaster


15

Disaster Recovery Objectives Recovery Point Objective (RPO) The point in time (prior to the outage) in which system and data must be restored to Tolerable lost of data in event of disaster or failure The impact of data loss and the cost associated with the loss

Recovery Time Objective (RTO) The period of time after an outage in which the systems and data must be restored to the predetermined RPO The maximum tolerable outage time


16

Recovery Point/Time vs. Cost Critical Data Is Recovered

Systems Recovered and Operational

Disaster Strikes

Time Recovery Point time t0 Days

Tape backup

Recovery Time Time t1

Hours

Mins

Secs

Time t2 Secs Mins

Periodic Asynchronous Synchronous Extended Replication Replication Replication Cluster

$$$ Increasing Cost

Smaller RPO/RTO Higher $$$, replication, hot standby © 2006 Cisco Systems, Inc. All rights reserved.

Hours Days

Weeks

Manual Migration

Tape Restore

$$$ Increasing Cost

Larger RPO/RTO Lower $$$, tape backup/restore, cold standby 17




18

Failure Scenarios Disaster Could Mean Many Types of Failure Network failure Device failure Storage failure Site failure


19

Network Failures ISP failure Dual ISP connections

Service Provider A

Internet

Service Provider B

Multiple ISP

Connection failure within the network EtherChannel® Multiple route paths


20

Device Failures Routers, switches, FWs

Service Provider A

Internet

Service Provider B

HSRP VRRP

Hosts HA cluster LB server farm NIC teaming


21

Storage Failures Disk arrays RAID

Service Provider A

Internet

Service Provider B

Disk controllers Storage Replication Site to Site Mirroring Optimization


22

Site Failures Partial site failure Application maintenance

Service Provider A

Internet

Service Provider B

Application migration Application scheduled DR exercise

Complete site failure Disaster


23




24

Warm Standby A data center that is equipped with hardware and communications interfaces capable of providing backup operating support Latest backups from the production data center must be delivered Network access needs to be activated Application needs to be manually started


25

Disaster Recovery—Active/Standby

App A

App B

App A

App C

IP/Optical Network FC

Primary Data Center


Secondary Data Center (Warm Standby)

FC

26

Hot Standby A data center that is environmentally ready and has sufficient hardware, software to provide data processing service with little down time Hot backup offers disaster recovery, with little or no human intervention Application data is replicated from the primary site A hot backup site provides better RTO/RPO than warm standby but cost more to implement Business continuance


27

Disaster Recovery—Active/Standby

App A

App B

App A

App C

IP/Optical Network FC

FC

Primary Data Center © 2006 Cisco Systems, Inc. All rights reserved.


Active/Active DR Design Multiple Tiers of Application Service Provider A

Internet

Service Provider B

Presentation Tier Application Tier Storage Tier


29

Active/Active Data Centers Internal Network

Service Provider A

Internet

Service Provider B

Internal Network

Active/Active Web Hosting Active/Active Application Processing Active/Standby Database Processing or Active/Active for Different Application © 2006 Cisco Systems, Inc. All rights reserved.

30

Components of Disaster Recovery


31


Components of Disaster Recovery Site Selection—Front End GSLB Server High Availability—Clustering Data Replication and Synchronization—SAN Extension


32

Site Selection Mechanisms Site selection mechanisms depend on the technology or mix of technologies adopted for request routing: 1. HTTP redirect 2. DNS-based 3. L3 Routing with Route Health Injection (RHI)

Health of servers and/or applications needs to be taken into account Optionally, other metrics (like load) can be measured and utilized for a better selection


33

HTTP Redirection—Traffic Flow

http://www.cisco.com/ http://www1.cisco.com/

Kee ves pali

1 /1. .com P TT isco ed om H v / c o M co.c ET w w . G 2 1. t: w 30 2.cis 1 . s P/1 www Ho T n: HT 2. atio c Lo 3. GET/H TTP/1.1 Host: ww w2.cisco .co

m

HTTP/1.1 200 OK

http://www2.cisco.com/ © 2006 Cisco Systems, Inc. All rights reserved.

34

DNS-Based Site Selection—Traffic Flow Authoritative Name Server for .com

Root Name Server for/ DNS Proxy 2 3

Authoritative Name Server cisco.com

4 5 6

1 10

8

7

Authoritative Name Server www.cisco.com

http://www.cisco.com/ UDP:53 TCP:80

Data Center 1 © 2006 Cisco Systems, Inc. All rights reserved.

es aliv p e Ke

Ke epa live s

9

Client

Data Center 2 35

Route Health Injection—Implementation Client A

Router 11

Router 13

Client B

Router 10

Low Cost

Router 12

Very High Cost Location A Backup Location for VIP x.y.w.z


Location B Preferred Location for VIP x.y.w.z

36

Site Selection Summary Redundancy

Convergence

App Health Visibility

Site Persistence

Active/Active

No

No

Yes

DNS

Active/Active

DNS Cache

Yes

No

RHI

Active/Standby

Within Secs

Yes

No

Mode HTTP Re-Direct


37




38

Cluster Overview Load Balancing Cluster : multiple copies of the same application against the same data set, usually read only High Availability Cluster : multiple copies of application that requires access to a common data depository, usually read and write Clustering provides benefits for availability, reliability, scalability, and manageability


Web Servers

Application Servers

Database Servers

39

High Availability Cluster Design Public Network : Client /Application requests

Private Network : Interconnection between nodes

APP Cluster Software Cluster Enabler OS

Storage Disk : Shared storage array, NAS or SAN © 2006 Cisco Systems, Inc. All rights reserved.

40

HA Cluster Application View Active/standby Standby takes over when active fails Two-node or multi-node

Active/active Database requests load balanced all nodes Lock mechanism ensures data integrity

Shared everything

Node1

Node2

Each node mounts all storage resources Provides a single layout reference system for all nodes

Shared nothing Each node mounts only its “semi-private” storage Data stored on the peer system’s storage is accessed via the peer-peer communication


41

Geo-Clusters Considerations Geo-Cluster: Cluster That Span Multiple Data Centers WAN Local

Remote Datacenter

Datacenter

Node2

Node1

Challenges: Disk Replication Synchronous or Asynchronous 2 x RTT

Split brain L2 heart-beats Storage


42

HA Cluster Challenges : Split-Brain Split-brain : Active nodes concurrently accessing the same disk, leads to data corruption Node1

Node2

Resolution : Use a Quorum, a tie breaker for gaining access to the disk

Data Corruption


43

Layer 2 Heartbeats Extended L2 Network : L2 adjacency required for node’s heartbeat. Extending VLAN across site is hazardous Node1

Resolution : L3 Capability for Cluster Heartbeat. EoMPLS to carry L2 hearbits across DR sites.


WAN Local Datacenter

Remote Datacenter

Public Layer 2 Network Private Layer 2 Network

Node2

Disk Replication Synchronous or Asynchronous

44

Storage Disk Zoning Storage Zoning : Taking over of storage disk array when active node fails.

Node1

Node2 Standby

Active

Extended SAN

Resolution : Cluster software to communicate with the Cluster Enabler. Instructs the Disk Array to perform an failover when failure is detected.

sym1320 RW RW


sym1291 WD WD 45




46

Storage for Applications Presentation tier Unrelated small data files commonly stored on internal disks Manual distribution

Application processing tier Transitional, unrelated data Small files residing on file systems May use RAID to spread data over multiple disks

Storage tier Large, permanent data files or raw data Large batch updates, most likely real time Log and data on separate volumes


47

Replication: Modes of Operation Synchronous All data written to local and remote arrays before I/O is complete and acknowledged to host Speed of Light = 3 x 108m/s (Vacuum) ≈ 3.3µs/km Speed through Fiber ≈ ⅔ c ≈ 5µs/km 2 RTT per write I/O = 20µs/km

Asynchronous Write acknowledged and I/O is complete after write to local array; changes (writes) are replicated to remote array asynchronously


48

Synchronous vs. Asynchronous TradeOff Enterprises Must Evaluate the Trade-Offs Synchronous

Asynchronous

Impact to Application Performance

No Application Performance Impact

Distance Limited (Are Both Sites Within the Same Threat Radius)

Unlimited Distance (Second Site Outside Threat Radius)

No Data Loss

Exposure to Possible Data Loss

Maximum tolerable distance ascertained by assessing each application Cost of data loss © 2006 Cisco Systems, Inc. All rights reserved.

49

Data Replication with DB Example • DB name • Creation date • Backup performed • Redo log time period • Datafile state

Control Files

Identify

Control files identify other files making up the database and records content and state of the db Datafile is only updated periodically Redo logs record db changes resulting from transactions

Datafiles

Record Changes To

• Table spaces • Indexes • Data dictionary © 2006 Cisco Systems, Inc. All rights reserved.

Redo Log Files • Database changes

Used to play back changes that may not have been written to datafile when failure occurred Typically archived as they fill to local and DR site destinations

50

Data Replication with DB Example (Cont.) Time

...

t0

...

...

Archived Redo Logs

Hot Backup of Datafiles and Control Files Taken at Time t0

Online Redo Logs

t1

Failure or Disaster Occurs at Time t1 • Media failure (e.g., disk) • Human error (datafile deletion) • Database corruption

Database restored to state at time of failure (time t1) by: 1. Restoring control files and datafiles from last hot backup (time t0) 2. Sequentially replaying changes from subsequent redo logs (archived and online)—changes made between time t0 and t1


51

Data Replication with DB Example (Cont.) Primary Site

Redo Logs (Cyclic) Copy of Every Committed Transaction

Database

Synchronously Replicated for Zero Loss

Secondary Site Earlier DB Backups

SAN Extension Transport

Database Copy at Time t0 Point in Time Copy Taken When DB Quiescent

Redo Logs (Cyclic)

Database Copy at Time t0

Replicated/Copied

Archive Logs

Replicated/Copied

Archive Logs

Mixture of Sync and Async Replication Technologies Commonly Used • Usually only redo logs sync replicated to remote site • Archive logs created from redo log and copied when redo log switches • Point in Time (PiT) copies of datafiles and control files copied periodically (e.g., nightly) © 2006 Cisco Systems, Inc. All rights reserved.

52

Data Center Interconnection Options Internet

High Density Multilayer LAN Switch

Stateful Firewalls

Stateful Firewalls

Content Caching

Content Caching

Server Load Balancing

SONET/SDH

DWDM/ CWDM Back-End Application Servers

IP/Metro E


High Density Multilayer LAN Switch

Front-End Application Servers

Front-End Application Servers

Enterprise-Class Storage Arrays

Server Load Balancing Intrusion Detection

Intrusion Detection

High Density Multilayer SAN Director

Internet

Back-End Application Servers

Enterprise-Class Storage Arrays

High Density Multilayer SAN Director

53

Data Center Transport Options Increasing Distance Data Center Campus Metro

Optical

Dark Fiber Sync

Regional

National

Limited by Optics (Power Budget)

CWDM Sync (2Gbps)

Limited by Optics (Power Budget)

DWDM Sync (2Gbps Lambda)

Limited by BB_Credits

IP

SONET/SDH Sync (1Gbps+ Subrate) Async MDS9000 FCIP Sync (Metro Eth)


Async (1Gbps+)

54

DATA CENTER ARCHITECTURE TRENDS

© 2005 Cisco Systems, Inc. All rights reserved. © 2006 Cisco Systems, Inc. All rights reserved.

55 55

Cisco Data Center Vision Server Data Storage Fabric Network Network Network

LAN WAN MAN

SAN

HPC Cluster GRID

Intelligent Information Network

Enterprise Applications

VIRTUALIZATION Management of resources independent of underlying physical infrastructure to increase utilization, efficiency and flexibility

AUTOMATION Dynamic provisioning and autonomic Information Lifecyle Management (ILM) to enable business agility Business Policies On-Demand Service Oriented

Compute

CONSOLIDATION

Network

Centralization and standardization to lower costs, improve efficiency and uptime

Storage


Compute Network Storage

56

Summary


57

What we have talk so far? DR and its Business Objectives Define budget, Technical solution Management Buy In DR is a process

Components of a Data Center Multi Tier Architecture Front-end, Application, Backend Database

Techniques in Data Center Disaster Recovery HTML Re-Direction/GSS/RHI Clustering SAN extension

Trends in Data Center Technology © 2006 Cisco Systems, Inc. All rights reserved.

58

Today’s Data Centers Require an Architectural Approach to… Protect with Business Resilience Tighten security Improve business continuance

Optimize with Consolidation Improve operational efficiency and resource utilization Lower complexity and cost of ownership

Grow towards Services-oriented Infrastructure Align virtualized resources with business demands Automate infrastructure to respond dynamically © 2006 Cisco Systems, Inc. All rights reserved.

59

The Big Picture—The Cisco Data Center The Emerging Data Center Architecture

MAINFRAME CONNECTIVITY

ENTERPRISE TAPE STORAGE

ENTERPRISE DISK STORAGE ENTERPRISE SAN SWITCHING

Virtual Fabrics (VSANs)

MDS 9000 Family

Embedded Intelligent Storage Services

Storage Virtualization Data Replication Svcs

Embedded Intelligent Network Services

Fabric Routing Svcs

Server Balancing

Multiprotocol Gateway Services

VPN Termination

Embedded Intelligent Virtualization Services V

Server Virtualization VFrame

Virtual I/O

SSL Termination

Catalyst 6500 Family

Firewall Services

TOPSPIN FAMILY

Intrusion Detection

Grid/Utility Computing Low Latency RDMA Services Clustering

Server Farm Switching NAS

Enterprise NAS Storage

WIN

ENTERPRISE GRID

UNIX

UNIX/Windows Servers


SERVER FABRIC SWITCHING

Blade Servers

Virtual Private Server Fabric #1

Virtual Private Virtual Private Blade Server Server Fabric #3 Fabric #2

60

What’s Next? A Security Strategy to Protect the Data Center Understands the vulnerabilities, and apply the relevant mitigations

Leverage on Cisco’s Technology to Optimize the Server Resources Reducing TCO for DRs Virtualization to maximize resource invested Grow DC infrastructure, enabling Business Agility Automating computing resources provisioning Speed of deploying new services


61

Q and A


62


63

Data Center Disaster Recovery - Cisco

Recommend Documents