7 critical steps in testing a disaster recovery plan
It’s crucial for a healthcare organization to keep its crisis plan up to date, especially as the entity grows and changes.
7 critical steps in testing a disaster recovery plan
There’s a tension with disaster recovery plans—healthcare organizations have them, but hope they never use them. And while no one wants to discover that a disaster recovery is over-resourced—with too much time, money and energy spent on maintaining it—yet it needs regular stress testing to ensure it will do its job. Eric Dynowski, chief technology officer at ServerCentral Turing Group, suggests regular stress tests of a disaster recovery plan to ensure it doesn’t become outdated—in fact, it needs to evolve as an organization changes. “At a previous firm, we tested quarterly and found changes and updates during every test,” he says. Here are seven steps to ensure that a disaster recovery plan fits a healthcare organization’s current needs.
Plan for failure
“Things will fail during stress testing,” he says. “That’s the point of doing it.” Those involved in the stress test need to prepare for this reality—if they’re not, “people may feel incentivized to make everything look good rather than make sure that everything actually works.” Dynowski suggests that IT execs “need to be champion of failure; when a DR plan doesn’t go as you hoped, remember that you’re actually a hero.” These discoveries enable providers to get misalignments corrected to ensure failures don’t happen during a real calamity.
Put someone in charge
A disaster recovery stress test won’t happen if no one is explicitly assigned to lead it, Dynowski contends. Responsibilities for the person tasked with leading it include keeping track of changing business needs as they relate to DR; schedule and oversee regular tests; update the DR solution as needed; and update the DR plan to accommodate the new solution. “That last item is crucial: the whole point of testing is to identify parts of the plan that no longer fit your business needs,” he says. “It’s easy for a ‘failure’ during a test to feel like a personal failure. In reality, though, testing failures are wins, because they let you prevent real-world failures.”
Look at the current disaster recovery plan
Before anything else, the current disaster recovery plan needs to be reviewed, especially if it’s more than a year old. A plan that doesn’t align with current needs requires an update. The DR lead will want to collaborate with leaders outside IT to determine which applications and functionality need to run; which hardware must be online; and what dependencies flow from those two requirements. “Ideally, you’ll walk away from these conversations with recovery time objectives and recovery point objectives that, in the event of a disaster, accommodate everyone’s needs,” he says.
Get the C-suite on board
Getting leadership on board with the criticality of disaster recovery is crucial—without such support, it will be a struggle to get other leaders in the organization to prioritize testing. Spell out the importance of having a plan that enables care delivery to continue in any eventuality. “If your C-suite already understands the importance of having a disaster recovery plan, make sure they’re on board with the importance of testing,” Dynowski notes.
Stick a dollar sign on DR and testing
In talking to C-suite executives about stress testing, frame it in terms of dollars and cents: how long can your organization afford to have IT and other systems down? It’s important to help leadership calculate potential revenue losses from each minute of downtime for various scenarios, not to mention the potential reputational hit that even relatively minor downtime can cause if handled wrong. DR leads also should provide insight about the many types of disaster recovery solutions and the various costs of each.
Establish guidelines for real-life disaster recovery
A solid disaster recovery plan must include conditions that trigger it—for example, give a team five hours to fix a problem before the DR plan is invoked, or if certain parts of an application go down (in other words, are they non-essential functions that an organization can temporarily live without). Guidelines should be based on larger business, so triggers will need validation from other stakeholders in the company. Non-IT employees should be enlisted to help during stress tests—someone from each department should test the functionalities they depend on, and someone should be in charge of testing from a user’s perspective.
Plan the stress test strategically
Scheduled downtime can be an ideal time to conduct DR stress tests; when something goes wrong, the execs in charge can easily roll back to production and re-evaluate, recalibrate and update the plan. Disaster recovery is a whole-business effort, and any plan must work for the whole business. The whole business must participate in stress testing, and the plan and testing schedule should depend on the needs, budget and risk tolerance of the organization as a whole.