June 2012
Split Decisions — Separate Disaster Recovery and Business Continuity Plans Make Sense
By David Yeager
Radiology Today
Vol. 13 No. 6 P. 6
Whether caused by a natural disaster or something less dramatic, interruption of imaging services can quickly compromise patient care. Considering the vital role of medical images, limiting such interruptions is essential, so radiology departments need to have disaster recovery and business continuity plans in place.
To ensure the development of a comprehensive plan, R. L. “Skip” Kennedy, MSc, CIIP, technical director of imaging informatics for Kaiser Permanente medical centers in northern California, recommends imaging informatics professionals split disaster recovery and business continuity into two plans and, if possible, assign two teams to work on them. Disaster recovery tends to utilize tried-and-true methods that were developed during the 1960s and ’70s. Business continuity is a broader plan that accounts for less dramatic disruptions, which are far more likely to happen. Because business continuity practices are evolving at a rapid pace, the Society for Imaging Informatics in Medicine (SIIM) rewrote the chapter devoted to them in its book Practical Imaging Informatics; the new version was released at this year’s SIIM meeting.
Disaster recovery still relies, in large part, on backing up imaging servers with tape. The tape is updated on a regular schedule and is stored off site in a secure location. Since tape capacity has increased at roughly the same rate as disk capacity, it still offers a viable medium for storing large quantities of data. If a natural disaster occurs, the data can be retrieved and used to reboot the system. Kennedy says this model still works quite well for many facilities.
Business Interruption
A more complicated issue is what to do if there’s a service disruption that doesn’t force the hospital to shut down completely. An interruption of even one hour can wreak havoc, and longer outages can seriously affect the quality of patient care. Kennedy says the first step in addressing this issue is for hospitals and radiology departments to be realistic about what imaging downtime actually means for a facility. Kennedy says many facilities’ policies and procedures on how to respond to a service interruption contain statements indicating that “if the PACS fails, we’ll go back to film.”
“How are you going to do that [when you] got rid of the last printers four years ago?” he asks. “We haven’t kept pace with the fact that we’re now completely dependent on digital imaging.”
Dealing with this reality requires workflow reengineering. As long as the imaging data are intact, the main problem is maintaining business continuity. Kennedy says the most useful models for business continuity are those used by Internet commerce companies. Rather than moving all the data through a single pipeline, the data are broken into packets that can be routed more easily. Distributing data through multiple outlets allows them to flow around offline areas and reach their destination with fewer disruptions.
“When a server fails at Google, Google doesn’t put a business continuity plan in action—the systems themselves compensate for that—and that’s where we need to be,” Kennedy says. “When you go to Google now, you have no idea how many servers you’re touching. You’re touching dozens every single time you make a Google transaction. PACS is slowly evolving with a business continuity model relying on small, granular hardware solutions rather than big iron.”
Although PACS is moving in this direction, Kennedy believes PACS vendors could utilize this approach more fully. DICOM objects lend themselves to this sort of model because they already exist as discrete data elements and can be rerouted as necessary. The relative ease of redirecting them allows many small systems to be linked together, reducing the risk of service disruptions caused by a single server.
The guiding principle is to eliminate single points of failure. As an example, Kennedy cites a Kaiser Permanente employee who replicated an entire PACS in a cart-mounted server. The PACS could be hooked into a server via cables, allowing the emergency department to function in the event of an outage. Even something as simple as an open-source CD copy of a PACS program that can be loaded onto a computer can work in a pinch. Although neither solution is optimal, they can provide alternate pathways that allow clinical image distribution to continue until the main PACS is back online.
There and Back Again
If data are lost, however, they will need to be restored. Because imaging files tend to be large and there are many of them, mass recovery of an image database could take significantly longer than other types of data recovery, and this could have a significant effect on patient care. Moving 1,000 CT studies is not as simple as multiplying the time it takes to move one study by 1,000; there is an aggregate effect that has to be considered.
In the event of a significant data loss, the backup options that facilities have depend on their needs and available resources. A facility may choose to back up everything on site, which offers reliability but requires significant resources and may be problematic in the event of a natural disaster, or off site. The off-site options include a hot site, which is a mirror image of the facility that can be accessed relatively quickly but is resource intensive; a cold site, a less resource-intensive option, which is updated every few hours or daily and takes longer to access; and cloud storage, which may offer convenience and flexibility for a site’s data retrieval but is still fairly untested for large outages.
That’s not to say that cloud services can’t be the answer, but the vendors’ ability to deliver in an emergency depends largely on their infrastructure. If a vendor has contracted with a large service provider to supply the pipeline for the data, the facility is essentially dependent on the service provider to move its data. For this reason, Kennedy recommends looking very carefully at the contracts before signing up for a cloud service.
“It’s all a matter of data pipes. It comes down to a question of how fast you can get yourself there and back. And the service-level agreements tell you the story,” Kennedy says. “What are my actual times going to be? What are my aggregate delivery times? It’s one thing to have one person asking for something and getting it back in 15 seconds; it’s another thing to have 500 people asking for something, almost concurrently, and have the aggregate performance drive their clinical outcomes.”
No matter which recovery method is chosen, Kennedy recommends working with vendors to develop business continuity and disaster recovery plans. Some vendors will be more helpful than others but, either way, it’s probably a good idea for facilities to devote more attention to their plans because the facilities are ultimately responsible for ensuring that all bases are covered. The plans should also be tested on a regular basis because a system outage is the worst possible time for surprises. Ideally, there should be a backup plan for the backup plan.
— David Yeager is a freelance writer and editor based in Royersford, Pennsylvania. He is a frequent contributor to Radiology Today.