Scientific Research Institute
Single Point of Failure Data Centre Assessment
PTS undertakes an independent Single Point of Failure assessment of a business-critical Data Centre facility owned and operated by a leading scientific research institute.
The Institute leads ambitious collaborations across the globe that create the necessary networks of experts and facilities to support highly excellent research and transformative healthcare innovations, worldwide. The Data Centre holds in excess of 50 petabytes of storage which supports the work of over 2,500 staff distributed across 77 countries, making the continued uptime of the Data Centre absolutely critical to the successful operation of their business.
Having recently experienced an issue pertaining to the Data Centre infrastructure on-site, which subsequently impacted the provision of service to the business, the institute sought a professional opinion from PTS to perform an independent assessment of the current Data Centre. The study was to specifically focus on any Single Points of Failure and to achieve this, a set of technical and operational Due Diligence services were undertaken. These services looked to assess the design, installation, commissioning, and ongoing operations of the facility. The study would also produce prioritisation commentary concerning the remediation of any areas of concern, in conjunction with a view on any future roadmap activities for the Institute to consider. These results would enable the Client to strategically plan a move forward to enhance their delivery of IT services to the wider business.
The opening activity for this engagement implemented by PTS commenced with progression through an initial Discovery Phase. This included a desktop review of associated as-built drawings and documentation that had been specifically requested from the client. Having the ability to review and understand the configuration and layout of the Data Centre, before setting foot on-site, helped contextualise the facility when arriving to physically view it for the first time. This approach also helped minimise the impact to busy onsite client resources that facilitated PTS’s time to progress the study. A series of subsequent visits were also scheduled to engage with key resources from the varying areas of the client’s Data Centre teams. These meetings, in conjunction with specific time set aside to review physical areas of both the IT white space and the support plant areas, accelerated further information gathering and investigation.
The Discovery Phase progressed seamlessly into the following Analysis and Assessment Phase of the project whereby the data gathered, in conjunction with the site knowledge being gained, identified where gaps were evident and additional information was required. A formal log of these requests was progressed with the client and these areas also focussed direction concerning advanced questioning of staff coupled with additional physical reviews of the as-built infrastructure.
Validation of Findings
Determining exactly what constitutes a Single Point of Failure is critical to being able to provide clarity on any of the various findings. Once explicitly defined, each element of the Data Centres design and operation can be reviewed and assessed in order to understand what impact would be evident with regards to unexpected component failure or planned maintenance activities. Single Points of Failure can then be clearly identified and thus differentiated from failures such as Cascade Events, whereby multiple issues would result in an actual impact on a facility. The value that PTS brings to this type of assessment is by utilising the long-term experience and knowledge gained by its consultants to logically work through these Data Centre specific scenarios.
A final report was formulated which gave a thorough assessment of the current Data Centre infrastructure elements in use by the Client. The final report included PTS commentary on installed infrastructure that clearly discussed why certain decisions or discussion points had been made. The inclusion of high-level tables containing, not only the specific Single Points of Failure but also any additional Risks and Issues pertaining to a number of other areas identified in the Data Centre was also included. This information ultimately helps build up a wider picture of the overall environment. The possible remediation solutions put forward by PTS ranged from the Strategic to the Tactical and these, in conjunction with a graphic Road Map, helped the client understand how various elements could be prioritised and remediated when entering the planning phase of any future transformation.
As part of the collaborative approach that PTS undertakes with its clients, face-to-face presentations were also undertaken which enabled detailed and frank discussion with client personnel at varying levels within the organisation. These meetings, primarily aimed at presenting the study’s findings, also provided an opportunity for questions and answers which were beneficial to both parties to ensure that findings were interpreted correctly and clearly understood. Not all personnel within the Institute were Data Centre focussed, so PTS’s ability to engage and tone accordingly to the audience’s understanding of these critical environments, increased overall comprehension of the identified issues and the challenges that are ultimately posed to the wider business.
The Client had found itself in a position whereby it was keen to mitigate any impact to the provision of IT service to its wider business and employing the Professional Data Centre Services expertise offered by PTS resulted in clear and tangible benefits. The provision of a single document that clearly defined what was in place, how it was operated and what would need to change to bring infrastructure in line with current Data Centre Best Practice, provided the Client with a strong leverage tool to drive change within the organisation. Calling on the expertise of the PTS resource to help present these findings back to the business also increased staff understanding and clarity on particular subjects. Without being directly linked to the Client’s organisation, the provision of a frank, honest and independent assessment of the existing environment was possible. This provided a benefit by, not only identifying specific areas of concern but also offering a strategic view of the overall provision of service.
An overarching view on the future of IT service delivery can sometimes get clouded or lost when remediation of the day to day issues starts to consume the attention and focus of any expanding business. Having the opportunity to pause, assess the current environment and then develop a cohesive plan to move forward with, ensures that the provision of critical IT services can continue to be the backbone to the important ongoing works of the client.