arrow

Lessons from the Recent CrowdStrike IT Outage

The recent global IT outage sparked by a software glitch in a CrowdStrike update serves as a potent reminder of the vulnerabilities inherent in our increasingly complex digital world and the importance of a comprehensive release management strategy.

Global Impact


  • Over 8.5million Windows devices were affected
  • The cost of this disruption to Fortune 500 companies is forecast to exceed $5billion.

 

While the event caused understandable concern among stakeholders and the general public, it also provided invaluable lessons that will continue to strengthen our resilience and response capabilities.

Addressing Stakeholder Concerns


Stakeholders have every right to be concerned when a significant IT outage occurs. The recent disruption highlighted several critical issues, including operational interruptions, data accessibility, and potential security breaches.

 

For IT leaders in the built environment and commercial real estate sectors, the CrowdStrike outage presents specific lessons and action points:

 

  • Smart Building Systems and IoT Security: With the increasing adoption of smart building technologies and IoT devices, the potential attack surface has expanded significantly. Ensuring the security of these interconnected systems is critical. Regularly updating firmware, using robust encryption, and monitoring network traffic can help prevent breaches.
  • Building Management System (BMS) Resilience: Building management systems control critical infrastructure like HVAC, lighting, and security. Implementing redundant BMS servers and ensuring regular data backups can minimise the impact of IT outages on these essential services.
  • Tenant Data Protection: Commercial real estate often involves handling sensitive tenant data. Ensuring the security and privacy of this data through strong encryption, access controls, and regular security audits is crucial.
  • Remote Monitoring and Management: The ability to remotely monitor and manage building systems is essential for operational efficiency. Ensuring secure remote access protocols and multi-factor authentication can safeguard against unauthorised access.
  • Disaster Recovery and Business Continuity for Facilities: Developing and testing disaster recovery and business continuity plans specifically tailored to facility management can ensure quick recovery of operations. These plans should enhance the scheduled ‘black building tests’ and consider physical security, emergency power supplies, and alternative communication methods.
  • Collaborative Risk Assessment: Working closely with other stakeholders, including facility managers, tenants, and security teams, to conduct comprehensive risk assessments can identify potential vulnerabilities and address them proactively.

Future Implications


The inevitability of future IT crises necessitates that we all consider a proactive approach to capacity-building:

 

  • Investing in Redundant Systems: Implementing redundant and failover systems can prevent complete outages. These systems should be regularly tested to ensure they function correctly during an actual crisis.
  • Enhancing Incident Response Training: Regular training and simulations for IT teams can improve their readiness to handle real-world incidents. These exercises should include scenario-based training to cover a wide range of potential crises.
  • Developing Comprehensive Continuity Plans: Business continuity plans must be comprehensive, detailing steps for maintaining operations during various types of disruptions. These plans should be reviewed and updated regularly to adapt to evolving threats.
  • Leveraging Advanced Monitoring Tools: Utilising advanced monitoring and diagnostic tools can help detect and address issues before they escalate into full-blown outages. Continuous monitoring ensures early detection and swift response.

Identifying Longer-Range Implications and Remedies


The events of 19th July also highlighted several longer-range implications that must be addressed to ensure sustained resilience.

 

  • Strengthening Cybersecurity Posture: The outage serves as a reminder of the constant threat posed by cyber-attacks. Organisations must continually invest in advanced cybersecurity measures, including AI-driven threat detection and response systems.
  • Encouraging Industry Collaboration: Collaboration among industry peers can lead to the sharing of best practices and the development of standardised response protocols. Industry consortiums can play a pivotal role in fostering this collaboration.
  • Regulatory Compliance and Standards: Adhering to regulatory standards and compliance requirements ensures that organisations maintain a baseline level of preparedness and resilience. Regular audits and assessments can help identify and address potential vulnerabilities.
  • Promoting a Culture of Resilience: Building a culture that prioritises resilience involves educating employees at all levels about the importance of cybersecurity and crisis preparedness. This cultural shift can enhance the overall security posture of the organisation.

 

Crowdstrike’s global technology outage was a significant event that tested the resilience of many organisations. However, the lessons learned from this incident are invaluable, and by addressing stakeholder concerns, building better crisis management capacities, and implementing long-term remedies, we can ensure that we are better prepared for the inevitable next IT crisis.

 

Get in touch to explore how PTS can help you optimise your infrastructure to protect against future global outages.