Medical Evidence & Image Exchange Network

Disaster Recovery Model Transformation

Transforming Disaster Recovery model from Colo to Google Cloud Platform. Solving the problem with LCM management of ageing DR-hosted infrastructure. Transitioning IT operations from CAPEX to OPEX without the paradigm change while helping the customer begin meaningful cloud adoption journey.

Introduction

A late-stage high tech start-up with 100 full-time staff with a Primary production facility in Boston, MA, has asked PTS to help implement the phased transition of its client-server core application beginning with its Disaster Recovery infrastructure from a colo-hosted environment in Virginia to a strategically beneficial hyperscale partner – Google Cloud Platform. Primary factors driving business decision included the annualised cost of colo-hosted facility at $100,000pa and significant technical debt to refresh EOL Storage and Compute estate in the Disaster Recovery environment requiring an upfront investment of $750,000 in Capex and another $100,000 in vendor support fees over a 3-year term.

Solutions

PTS have worked with the IT Operations team to document the technical elements of the existing Prod-DR implementation consisting of:

applications and their dependencies:
storage requirements
encryption requirements
Business Continuity requirements:
daily volume changes
SLAs: RPO & RTO

The existing DR environment consisted of a core set of Debian 8 servers and Windows 2012 servers responsible for auxiliary real-time logging and monitoring, backup and IAM functions; DELL EMC VNX arrays serving CIFS filesystems as the primary storage to the application; SafeNet HSM security modules, Managed Firewall solution from Lumen.

PTS used their experience to position the best way forward to achieve maximum benefit:

IaaS: Compute for Debian, Windows VMs
IaaS: Google Migrate for Compute Engine for RSA KMS virtual appliances.
IaaS: Physical Big-IP LTMs to Virtual BIG-IP LTMs
Replace VNX CIFS shared filestores with GCP buckets.
Replace VNX SSD CIFS shared filestore with Google filestore
Implement logging/monitoring via Splunk Cloud SaaS

PTS selected optimal colocation for assets that cannot be migrated – Lumen managed firewall service and SafeNet Luna HSM appliances were relocated in the scaled-down colo-facility operated by Cyxtera. The proximity to the GCP DR region/zones ensured minimum latency delays (<1ms).

PTS have implemented Site-to-site connectivity between the production datacentre and the GCP hosted DR environment.

Benefits

Key benefits of this transformation included the discovery of application dependencies on the infrastructure elements and encryption. The Client was able to achieve their objectives in the following areas:

Reduce the DR colocation footprint and power and cooling consumption by 95% by migrating all the storage and all VMW compute to GCP.
Keep the same trusted managed firewall service by moving it to a colocation located within the 5 miles to the GCP region where the newly deployed availability zones are.
Keep the expensive HSM appliances in the colocation within 5 miles to the GCP region where the newly deployed availability zones are while avoiding very costly investments into application refactoring to take advantage of the cloud-native encryption capabilities.
Take advantage of the cloud-native storage capabilities and data lifecycle management capabilities to avoid costly investments in VNX Hardware refresh Capex.
Create robust logging and alerting capability for the DR environment by integration with Splunk cloud SaaS.
Stop recurring charges for EVPL network links by configuring site-to-site VPN tunnels between the production datacentre and the new environment in the GCP.
Take advantage of the Google gsutil rsync to configure and automate multithreaded/parallel data transfers from the production data centre to GCP.
Eliminate recurring license and support costs for VMW infrastructure by migrating Debian and Windows VMs to the native GCP platform.