Before Trouble Strikes

By Elizabeth M. Ferrarini

(Back to article)

In 1992 Hurricane Andrew put 39 major data centers out of commission. And in 1993 the World Trade Center bombing caused 21 data centers to shut down. While you don't like to think about it, every organization, regardless of its size, runs the risk of a major systems outage, such as a tornado demolishing a data center or a building fire destroying the facility and everything in it. A study by the University of Texas found that 85 percent of businesses depend totally or heavily on information technology systems to stay in business, and that a loss of those systems would cost businesses up to 40 percent of their daily revenues.

Disaster can strike at any time. In fact, there are more than 35 types of disasters, ranging from the most common, such as power outages, to the most catastrophic, such as earthquakes. In essence, a disaster includes any type of interruption of service that results from some force beyond the organization's control. Disaster recovery provides systematic procedures for how to react to and how to recover from that ominous external or internal force. Disaster recovery planning, which complements business continuity and contingency planning, ensures the ability of the organization to function effectively if an unforeseen event severely disrupted normal operations.

The following checklist will help the key individuals in your organization prepare a disaster recovery plan. The objective is to restore all critical business functions, rather than just such disparate functions as the data center.

Gather Information

Organize the Project
A successful initiative of this magnitude requires support from senior management associated with the organization, a dedicated disaster recovery team whose members have knowledge of critical business systems, and a well thought out planning and testing strategy.

Senior executives responsible for disaster recovery planning will perform the first two steps. The disaster recovery coordinator, working with the appropriate team leaders, should perform steps 3 to 7.

  1. Determine which senior executive(s) will have overall responsibility for disaster recovery.
  2. Have this executive appoint disaster recovery coordinator.
  3. Appoint a disaster recovery team leader for each operational unit, such as server backup or telephone system.
  4. Convene disaster recovery planning team and sub-teams as appropriate.
  5. Working with senior executives responsible for disaster recovery, the disaster recovery coordinator should identify the following:
  6. Set project timetable and draft project plan, including assignment of task responsibilities.
  7. Obtain senior management's approval for scope, assumptions, and project plan.

Conduct Business Impact Analysis
The disaster recovery planning team should perform this step to identify which business departments, functions, or systems are most vulnerable to potential threats, what are the potential types of threat, and what effect would each identified potential threat have on each of the vulnerable areas within the organization.

  1. Identify functions, processes, and systems.
  2. Interview information systems support personnel.
  3. Interview business unit personnel.
  4. Analyze results to determine critical systems, applications, and business processes.
  5. Prepare impact analysis on interruption on critical systems.

Conduct Risk Assessment
The disaster recovery planning team should work with the organization's technical and security person to determine the probability of each functional business units' critical systems becoming severely disrupted and to document the amount of acceptable risk the business unit can tolerate. For each critical system, provide the following information:

  1. Review physical security, i.e. secure office, building access off hours, etc.
  2. Review backup systems and data security.
  3. Review policies on personnel termination and transfer.
  4. Identify systems supporting mission critical functions.
  5. Identify vulnerabilities, such as physical attacks, or acts of God, such as floods.
  6. Assess probability of system failure or disruption.
  7. Prepare risk and security analysis.

Develop Strategic Outline for Recovery
The steps outlined here provide all of the components necessary to perform a recovery. These steps will help pull together information about the operations of all systems, especially those owned or managed by non-technical managers with help from technical support personnel. Steps one through four mainly apply to functional business units that manage technology systems to process critical functions. The disaster planning recovery team and the functional business unit may wish to appoint other appropriate individuals to perform subsequent tasks.

  1. Assemble groups as appropriate for the following:
  2. For each system/process above quantify the following processing requirements.
  3. Detail all the steps in your workflow for each critical business functions. (For example, for payroll processing include each step that must be complete and the order in which to complete them.
  4. Identify systems and applications.
  5. Identify all vital records.
  6. Identify if a severe disruption occurred what would be the minimum requirements or replacement of the critical function during the disruption.
  7. Identify if alternative methods of process either exist or could be developed, quantifying on processing (include manual processes).
  8. Identify person(s) who support the system or the application.
  9. Identify primary person to contact if system or application cannot function as normal.
  10. Identify secondary person to contract if system or application cannot function as normal.
  11. Identify all vendors associated with the system or application.
  12. Document business unit strategy during recovery (conceptually how will the unit function?).
  13. Quantify resources required for recovery by time frame.
  14. Develop and document recovery strategy, including priorities for recovering system/function components, and recovery schedule.

Review On-site and Off-Site Backup and Recovery Procedures
The disaster recovery planning team should perform this task to provide for a current backup of critical program and data that can be used in the even of a disaster. To this end, the disaster recovery planning time can reduce downtime and speed recovery.

  1. Review current records (operating systems, code).
  2. Review current off-site storage facility or arrange for one.
  3. Review backup and off-site backup storage policy or create one.
  4. Present to functional business unit leader for approval.

Select Alternate Facility
The disaster recovery should perform the task of looking for a location, other than the normal facility, used to process data and or conduct business, in the event of a disaster.

  1. Determine resource requirements.
  2. Assess platform uniqueness of unit systems (Macintosh, IBM, Oracle, etc.).
  3. Identify alternative facilities.
  4. Review cost/benefit.
  5. Evaluate and make recommendation.
  6. Present to business unit leader for approval.
  7. Make selection.

Plan Development and Testing

Develop Recovery Plan
This document defines the resources, actions, tasks and data required to manage the recovery in the event of an interruption. The plan is designed to assist in restoring the business process within the stated recovery goals. The disaster recovery coordinator should perform these steps assisted by the disaster planning committee as needed.

  1. Objective -- This may have been documented in the Information Gathering phase. Establish information for each business unit
  2. Plan Assumptions
  3. Criteria for invoking the plan:
  4. Role Responsibilities and Authority
  5. Procedures for operating in contingency mode
  6. Resource plan for operating in contingency mode
  7. Criteria for returning to normal operating mode
  8. Procedures for returning to normal operating mode
  9. Testing and Training
  10. Plan Maintenance
  11. Appendices for inclusion

Test the Plan
Testing the plan enables the disaster recovery planning team to see how their recovery plan and procedures work in practice. It enables everyone to get a reasonable assurance that a plan will make the grade when it really counts -- in an actual disaster.

  1. Develop test strategy.
  2. Develop test plans.
  3. Conduct tests.
  4. Modify the plan as necessary.

On-going Maintenance

Maintain the Plan
Disaster recovery plans can have a shelf life between six and 12 months depending on the changes in the organization's procedures, systems, and personnel. Having a program in place to maintain the plan will ensure that everyone, especially the disaster recovery planning team, will be ready if a real emergency occurs.

The senior management executive responsible for disaster recovery assisted by the disaster recovery coordinator should oversee this step:

  1. Review changes in the environment, technology, and procedures.
  2. Develop maintenance triggers and procedures.
  3. Submit changes for systems development procedures.
  4. Modify unit change management procedures.
  5. Produce plan updates and distribute.
  6. Establish period review and update procedures.
Elizabeth M. Ferrarini is a free-lance writer based in Arlington, Mass. This story first appeared in CrossNodes, an internet.com site.