On July 19, 2024, CrowdStrike Falcon, the popular endpoint detection and response (EDR) software, pushed a flawed update to endpoints around the world resulting in an endless reboot cycle and blue screen error on an estimated 8.5 million Windows computers, completely shutting down all functionality.1 The full impacts from this event will continue to play out, but the incident has left many organizations considering how to avoid a similar event. Here are a few takeaways IT leaders should consider:
Software Due Diligence
Software implementation projects will need to develop a more in-depth risk assessment. What level of system access is being granted to this software? Are there systems in the environment that allow changes to occur without change control as typically malware detection updates do? In this circumstance, a malformed CrowdStrike configuration (non change control mandated) file update resulted in full operating system crashes on Windows devices due to interactions between the Falcon software and the Windows kernel.2 Modern system hardening standards then made it difficult for nonadministrator end users to apply any fixes sent to them by their IT departments. IT teams will claim patches can just be rolled back, but a better understanding of what could go wrong with new software will be of critical importance for industries with high availability requirements. This will require IT teams to understand the level of interaction between prospective software and operating systems in their environments.
Rethinking & Rescoping Patch Management Programs
Many organizations have begun to develop a cavalier approach to the update testing and rollout process, especially for security-related updates and updates considered to be approved outside of a formal change control or QA process such as malware signature updates or minor config file changes. They cite how security risks outweigh bad patch risks, but that calculus must now be reassessed. Imagine if there were some way airlines could have ensured one-third of their PC and server fleet received Falcon updates on a few-hour delay to allow them to stop the rollout once they realized something had gone awry. Business interruption issues would still occur, but the impact would likely not have been as far-reaching. It should be noted, however, that the potential risk of zero-day malware attacks will need to be balanced with potential delays in signature updates. IT teams now more than ever need to weigh the costs and benefits of early update implementation and creating a common sense testing and/or rollout plan. In addition, outsourced, cloud-based updates to organization devices must now ultimately be considered the responsibility of the organization, not the vendor. If IT isn’t comfortable with the vendor’s update process, further work needs to be done.
In-Depth Recovery Strategies
EDR products like CrowdStrike aren’t going away, and we can’t completely eliminate the risk of a similar widespread event. Instead, IT teams will have to brainstorm ways to push efficient workarounds faster. Business continuity teams also will have to train more in how to keep businesses going after a critical systems failure. If you experienced an outage from CrowdStrike, please take the opportunity to perform a formal, written debrief with your crisis management team to find out what could have gone better with the right technology, training, and offline procedures in place.
Vendor Management
In our current outsourced environment, the technologies and patch management programs our critical vendors are using are just as important as our own. This can result in a scenario where, even if an organization didn’t have CrowdStrike, that organization may still have been materially impacted because a critical vendor is out of commission. While vendors may not be willing to give insight into security software used in their environment, companies should still ask questions to understand critical technology providers’ patch management program and how they evaluate new software.
Third-Party Technology Providers
Woe to software providers who do not learn from the CrowdStrike outage! You have a duty to develop a consistent and reliable testing methodology. This is a hard-learned lesson, but one we haven't had such a dramatic reminder of in quite some time. Facebook founder Mark Zuckerberg’s motto, “Move fast and break things,” should not be the motto of critical software developers. Evaluate and re-evaluate your testing and rollout processes to help reduce the risk you end up in a similar situation.
Professionals at Forvis Mazars can assist you in many aspects of evaluating your IT practices, make recommendations, and help implement, enhance, and augment your Business Continuity Management and Third-Party Risk Management programs. If you have any questions, please reach out to one of our professionals.