September 7, 2009
Disaster Recovery Planning
By Kathy Hardy
Radiology Today
Vol. 10 No. 16 P. 6
The medical community learned valuable lessons from Hurricane Katrina in August 2005, including how to keep medical records safe and available to physicians who need them to treat the patients seeking care during a catastrophe.
Knowing now what they didn’t know then, imaging department directors and PACS administrators are looking at their disaster recovery plans, going beyond system failure and recovery to consider the entire hospital and its day-to-day operations. In the imaging environment, disaster recovery typically addresses getting patient medical data and images back up and running. However, ordinary workflow (eg, admissions, test scheduling, and billing) also grinds to a halt when a hospital loses its communications system.
“Katrina made it easier to build the case for needing a disaster plan,” says Tony Jones, PACS administrator for the University of Utah School of Medicine in Salt Lake City.
What many disaster planners may not have learned, however, is what actually qualifies as a potentially disastrous event. Meteorological activities are somewhat predictable, but what about glitches within a facility’s network? Or what happens when a simple software upgrade turns into 10 hours of downtime? Any uncontrolled event, natural or manmade, that negatively affects business operations for any length of time is a disaster.
Defining Disaster
“There are situations where the disasters you anticipate don’t turn out to be disasters,” says Fred M. Behlen, PhD, founder and president of Laitek, Inc, a Homewood, Ill., data migration company.
Behlen recounts a situation he encountered in a Houston hospital following Hurricane Ike in September 2008. The hospital lost power, but its data center continued to run on one diesel generator. However, when the generator was refueled, water got into the tank and the generator stopped running, leaving no source of power for the servers. “The real disasters are where you don’t expect them,” he says.
That’s where disaster recovery planning plays a role. It covers Mother Nature’s fury, as well as what man can cause himself. Oftentimes, recovery plans are designed for the natural disaster “worse-case scenarios” that decimate infrastructure and computer networks. These disasters require enterprisewide disaster recovery strategies that often don’t address the short-term needs of individual departments. The smaller situations—such as a communication line being accidentally disconnected or a piece of hardware failing—are becoming more critical, particularly as more hospitals and imaging centers implement RIS and/or PACS.
“You need to determine the different types of disasters that could occur and create plans that are appropriate for each,” Jones says. “A plan that completely replicates your business center is not necessary for eight hours of downtime.”
How Much Downtime?
With the University of Utah’s business continuation system, Jones explains that a live server is connected to the hospital’s PACS, and all images are duplicated to that system a few minutes after they are taken. While the hospital’s main off-site duplication center is located about one mile from the hospital, the business continuation system is situated in-house.
“We can store three to four weeks’ worth of exams on the business continuation system,” he says. “That’s suitable for 24 to 48 hours of downtime.”
To protect against more far-reaching disasters, all imaging studies at Utah are mirrored to an off-site data center located “states away,” Jones says.
Behlen stresses that hospitals should not rely on single backup copies of any data and that for significantly important data, “two copies aren’t enough.” For example, a site with a tape-based backup for its PACS can experience “substantial losses because both primary and backup tapes are bad. It’s not as rare as you think.”
“The error on the original would be saved on the backup tape,” Behlen adds.
“In terms of a natural disaster,” Jones says, “make sure your off-site data center lives one natural disaster away, so that it’s not affected by the same hurricane or earthquake as your hospital.”
With the primary natural disaster indigenous to Utah being earthquakes, the hospital worked with the university’s own geological surveyors to determine the best location for its off-site storage. “Our off-site storage is located on a separate tectonic plate,” Jones says.
In other geographic areas more prone to hurricanes or tornados, disaster recovery planners meet with meteorologists for long-range forecasts when selecting their secure, off-site data backup locations.
Paper to Digital
The needs for a data disaster recovery plan all start with the move from paper-based to digital records. Behlen warns that costs can escalate unnecessarily for large data storage and recovery systems with multiple data centers, possibly more than a facility may need. Instead of thinking big is more, planners need to think about their medical data recovery priorities and about taking care of patients locally, he says.
“You need to look at it from a clinical perspective,” Behlen says. “You really need to work out what kind of services you may have to recover within the first 24 hours. First, do you have a way of looking at images that were captured today? Then, consider a way of viewing images taken a few weeks ago. Each perspective has different clinical pictures associated with them.”
Considerations can then be made for services that may not be needed for 72 hours after the disaster and then a week later and a month later, should the downtime last that long.
When in the triage mode of a disaster, Behlen notes that the most important data to access are most likely going to be images taken at the time, not a patient’s past clinical record.
“In the terms of a natural disaster, you should be concerned about getting images from the scanner to the screen when everything else has broken down,” he says. “Worry about that first, then consider how to access more recent images as you can get to them.”
System and Plans
A common error made by PACS administrators is assuming that purchasing a disaster recovery system is sufficient to satisfy a hospital’s emergency needs. When making the transition from paper to digital, however, it’s important to make sure the solution provides not only disaster recovery features but also a disaster recovery plan.
“Production servers are fairly off-the-shelf purchases,” Jones says. “They’re not that hard to find in a disaster, but the data is.”
While data recovery is the ultimate goal, nothing can be recovered without a plan in place. Oftentimes, creating that plan is a challenge all its own, according to Jones.
“It’s easy to go to a PACS vendor to buy a backup system,” he says, “but you still need the plan. The system is no good if no one knows how to restore the data. Just buying a solution doesn’t mean you have a disaster recovery plan.”
System Testing
Another challenge is system testing. Obviously, hospital personnel need round-the-clock access to data. With that in mind, what’s the best way to find out whether your disaster plan works? Simulate a disaster and see what happens. Jones notes that it takes buy-in from hospital administrators to take this type of drastic, yet necessary, step to ensure that the facility’s data will remain safe and accessible during downtime.
“No one wants to have testing because it means downtime,” he says. “We’ve done a few mock tests and have taken smaller areas offline. You can also conduct testing overnight.”
Despite its risks, Jones considers testing a key step in the disaster planning process. “If nothing else, you need to be willing to test your downtime plan,” he says. “That allows you to know what it will do. You need to know, with patients, how it will work.”
Testing is a necessary evil, Behlen adds, which can also come with its own issues. He cites an example of a hospital that conducts monthly failover tests of its backup system. In one instance, the test itself was going fine, but nearby construction workers accidentally cut the main communication system’s fiber optic line, creating confusion during the test.
“They didn’t expect their disaster to happen while they were testing,” he says.
Multiple Scenarios
According to Jones, one common mistake is testing for the wrong scenario or for only one scenario. Natural disasters are events that people hear about most but are not the most frequent reasons for downtime. He recommends playing out a variety of scenarios and running through the plans to see whether they work.
Other areas where an eye for detail is important are in the duplication and storage of the actual disaster recovery plan. Part of the plan should include how and when to update the procedure, in what format backup copies should be created—disks, CDs, or even paper—and where those duplicate copies should be stored. The University of Utah updates its plan twice each year, according to Jones.
“There’s a joke among system administrators that says you shouldn’t store your disaster recovery plan on your server,” Jones says. “But obviously, copies of the plan need to be located where everyone can access them in a timely manner.”
Another factor to consider is how much of the plan should be public knowledge and how much should remain confidential, particularly information regarding the location of your down site. However, there needs to be a trusted network of individuals who are on the same page when it comes to the facility’s disaster plan procedure. In addition, planners need to consider redundancy in personnel, cross-training staff in disaster recovery tactics.
“In the case of a significant natural disaster, you may not have all the people you need to get the back-up system up and running,” Jones says. “They may be off handling other emergency-related tasks throughout the hospital.”
Barring a natural disaster, Jones and Behlen agree that hospital network users should be protected from the complexity of systems management and recovery. Recovery should be seamless, with no disruptions in day-to-day operations.
During downtime, conducting basic hospital business operations, such as registering new patients, checking whether they have been treated at the hospital previously, what medications they may have been prescribed at that time, any imaging they may have undergone, can continue manually but at significant time and expense. Costs are a factor in disaster planning, Behlen says, not only for purchasing recovery systems but in operational time lost. In addition, downtime means idle radiologists and technologists for routine exams and unused modalities. There’s also the public relations hit a hospital or imaging center would take if services were not available due to “technical difficulties.”
In addition, disaster recovery needs to be cost-effective, Behlen says, noting that with rising healthcare costs, facilities may need to do the best they can in the areas where recovery is most vital.
“They need to balance costs with needs,” he says. “There’s always the question, ‘How much do you spend to prevent a problem?’”
He concludes that hospitals and imaging centers need to first look at how much time and money was spent implementing its PACS and then think about how much it would take to recover the data shared by that system. The struggle continues to be how to plan for the disaster everyone hopes will never come.
— Kathy Hardy is a freelance writer based in Phoenixville, Pa., and a frequent contributor to Radiology Today.