8 Ways to Remain Vigilant and Ensure Business Continuity with Disaster Recovery

Ensuring Vigilance Testing for Potential Vulnerabilities Has Never Been More Critical

In light of recent events such as the CrowdStrike outage, it has become increasingly important for businesses to prioritize disaster recovery planning. With the growing reliance on interconnected systems and services, ensuring vigilance testing for potential vulnerabilities has never been more critical.

This article will delve into why businesses struggled to recover from the CrowdStrike outage, the importance of a solid disaster recovery (DR) plan and eight ways your business can remain vigilant to ensure business continuity with DR.

Why Did Businesses Struggle to Get Back Up and Running After the CrowdStrike Outage?

Process. Most companies build in a golden image that has all their agents and software installed. Additionally, they tend not to plan for scenarios like when an instance is down or consider how many are down and how that can impact their business.

In the hospitality industry, data is backed up by databases. So, the data is stored in a specific location, and a system retrieves that data to generate web page reports and other documents. Without a method of limiting agents, there isn’t a way to eliminate CrowdStrike, for example, and temporarily run without it. In this situation, systems were broken and wouldn't reboot, and there was no way to fix those systems.

At that point, the proper disaster recovery would be to get another system without the offending process on it, reconnect it and conduct business as usual. But you must have the fluidity to be able to do that. Many companies are still very monolithic, meaning system X can access database Z, and that's the only way it works. If system X is gone, you're down until it’s back, especially if you're dealing with physical hardware, and it’s limited to where only one system is connected.

Many companies can bypass these problems by leveraging virtualization and the cloud. This would allow you to redeploy a new instance, take the hard drive from the old one and put it in the new system, eliminate the CrowdStrike problem, put it back on the new system, and bring it back up. However, it requires knowing what the problem is to begin with.

In addition, updated hardware and systems are not just essential; they are a necessity in today's technological landscape. A lack of organization and disaster planning can make it increasingly difficult to get back up and running quickly.

Understanding the Significance of Good Disaster Recovery Plans Is Crucial

It's not just about having a plan in place but also about being prepared for the unexpected.

Despite the risks that businesses knowingly face daily, many organizations do not have solid DR plans. Cost and bandwidth are often the leading reasons why. Unfortunately, when you wait until something happens, you could find yourself in a situation where you must rebuild your environment.

A comprehensive disaster recovery plan accounts for potential threats to your systems, such as external factors or outdated operating systems. You have to be able to compensate for external agents and operating systems during recovery, or you don't have a disaster plan. If you plan to put your old system back when disaster strikes, that's good, but you need to consider that your systems may no longer be available.

CDW supports organizations with disaster planning by conducting evaluations of your systems and interactions between those systems. For example, you may have one web server access an application server, database server and some other file share that all comes together to form what the application can perform. CDW maps out their interconnectivity, the requirements, how they connect and what could transpire between them.

We also investigate your applications to understand their function and installation process. Installation is critical to review because most customers install it, replicate it in another region and think they have a disaster recovery plan.

Ninety percent of the time, that's fine, realistically. However, most DR plans were not ready when the CrowdStrike outage fell in that 10% margin for DR vulnerability. Today, businesses heavily rely on Software as a Service (SaaS), and if that fails, IT teams struggle to determine the course of action due to tight integration. This is why it's imperative to do a true pipeline deployment of your systems, so you can modify that pipeline when needed to exclude things you know are breaking your systems and bring it back up.

Extensive Business Interconnectivity Requires DR Vigilance and Testing

When we examine the hospitality industry crisis during the CrowdStrike outage, we see that when interconnected systems broke, it prevented businesses from viewing or modifying reservations and accepting credit cards. The widespread interconnectivity between businesses has made it so that some airports may not even run their own terminal software. If your organization relies on a third party business that causes a pipeline to break, a cascading failure occurs to all the systems reliant on it.

Here are 8 ways to remain vigilant:

Test before patching: Testing before patching in production is critical. Rigorous testing of software updates before deployment also helps identify potential issues. Continuous integration and continuous delivery (CI/CD) pipelines exist to deliver software accurately and seamlessly. When you create something new, such as a website or web application, if only a handful of people test it, you don't know how it will hold up when thousands of users try to access it simultaneously. CI/CD pipelines allow you to test whether it stays up with a large number of users, if it works and if there are issues with speed or loading. CI/CD pipelines allow you to adjust and retest.
Automate deployments and have clear golden images: If you're looking to mitigate DR issues, automated deployments and having clean golden images are essential. Too many IT teams add everything to their golden image and end up in deep water when they cannot bypass the agent. However, if you add agents as you go through the pipeline, you can turn that part off and rebuild the system without it, get all your normal settings and still have that automation while eliminating the problem.
Invest resources in your scalability: There are different types of scaling — monolithic and auto-scaling. Autoscaling allows you to replace dead systems with fixed ones more easily, whereas monolithic scaling requires rebuilding the entire infrastructure, getting the service back up, restoring interconnectivity, etc. Businesses relying on monolithic scaling took longer than those with autoscaling to get back up and running after the CrowdStrike outage because it requires a heavier lift.
Have a backup and recovery plan in place: Backup and recovery plans ensure quick restoration in case of failures. Implementing automated backups before software updates and rehearsed recovery procedures will always assist with resolving issues with planned software updates.
Conduct staged rollouts: Staged rollouts to a subset of users allow for monitoring and early detection of problems. By rolling out updates incrementally, it minimizes the impact of any unforeseen issues.
Implement monitoring tools: Monitoring and alerts allow quicker response to anomalies. Implementing monitoring tools helps detect issues early and trigger immediate action.
Keep an open line of communication with affected parties: Open and collaborative communication with affected parties will facilitate timely awareness and recovery during incidents. Establishing direct lines of communication helps coordinate responses effectively.
Validate: Post-deployment validation is essential to verify the update's impact on systems.

CDW offers comprehensive managed services to keep your business running while ensuring you have a solid DR plan that gives you peace of mind. To learn more about how CDW Managed Services can support your organization, visit our webpage or call 800-800-4239.

Sean Scott

Director Cloud Managed Services

view more work

As the Sr. Director of Managed Cloud Services, Sean drives innovation and growth with deep cloud expertise. Previous to his time at Sirius and CDW, Sean served as CIO for both a software development company and a law firm for 15 years. Sean’s passion for cloud technologies and deep understanding of customer needs helps empower organizations to unlock the full potential of the cloud.

view more work

Don DeHamer

Chief Technical Architect for CDW Managed Services Cloud Lifecycle Services

view more work

Don DeHamer has nearly three decades of experience in IT and is the chief technical architect for CDW Managed Services Cloud Lifecycle Services.

view more work