Operational Risk Management: Reaping the Benefits

By John Bennett

(Back to article)

There is no question today that operational risk management is a fact of life for all companies and IT departments of all sizes. Not only are CIOs faced with a fluctuating roster of unpredictable external threats that include natural disasters, e-mail viruses, and pandemics but, worldwide, there are regulatory concerns as well, including Sarbanes-Oxley and Basel II.
Four Steps to Better Risk Management
Whether your company is just starting to build a business continuity and availability solution or evaluating an existing one, a comprehensive process is required to understand where you are now and what steps you need to take to build a resilient infrastructure.

This process should start from the perspective of the business users, and includes four main components:

Step 1: Define business requirements. If the goal of a business continuity and availability solution is to support the business, the logical place to start is defining the business needs.

Evaluate the requirements of all business processes and applications across the enterprise in regard to regulatory compliance, availability, security and business continuity. Measure the impact of downtime for each business application and process to determine how much downtime is acceptable.

Step 2: Assess and prioritize risk. Once an understanding of the business requirements has been established, it’s time to factor in the risk.

Comprehensive, in-depth availability, security and continuity assessments help to identify areas of risk and guide strategies for protecting the IT environment and improving IT service. Best practices frameworks such as the IT Infrastructure Library (ITIL) and others should be used to evaluate existing processes.

This is the step for identifying gaps and leveraging what you learned in the first step to prioritize the risks according to business impact.

Please see page two for Steps 3 & 4.

From an internal perspective, the situation isn’t much brighter. Day-to-day issues related to people and technology cause too much of downtime.

In 2006, HP commissioned a survey about organizations’ business continuity and availability plans. An astonishing 90% of downtime was reportedly due to network/telecommunications issues, hardware or software failures, or operator error.

Business continuity and availability is a major priority in business today as companies increasingly place greater emphasis on protecting the business and achieving the right levels of IT availability on a daily basis.

The cost of not focusing on this can be extremely high. According to Infonetics Research, IT downtime costs large U.S. businesses an average of 3.6% of their revenues per year, with manufacturing organizations losing nine percent and financial services organizations losing an even greater 16% of revenue.

While all companies need to be aware of what risks they face and how best to address them, the best solutions are built around each company’s individual characteristics. Successfully managing operational risk comes down to having a successful business continuity and availability strategy that:

  • Defines business requirements;
  • Includes an up-to-date profile of your risks;
  • Drives design and implementation of the right solutions for the specific business requirements;
  • Balances risk against cost, and;
  • Allocates the right resources to continually re-evaluate, test and improve business continuity plans and solutions.

    Following the criteria outlined above will help yield a solution that is tailored to each company’s particular risks, needs and appropriate level of investment.

    Integration with the Business

    Business continuity, availability and security are interdependent, requiring an integrated, systemic approach to planning, design, implementation and management. As outlined above, the starting point for building a resilient IT environment should be a thorough understanding of the business requirements, the risks and threats the organization faces, and the impact of downtime on each of its critical business process.

    However, recent history has shown that most businesses move through a predictable series of business continuity preparedness. Most begin with nothing — no disaster recovery plan or, if there is a plan, it’s not tested or actionable — and end with a comprehensive business continuity and availability solution that’s focused on process integration, end-to-end planning, including partner integration, and continuous improvement/best practices.

    By starting with a comprehensive, integrated approach, a company can build a reliable infrastructure where the business’ required service levels can be maintained through adjustment of IT availability and performance. The earlier a company adopts such an approach, the easier it is to bake into its culture the integration between people, processes and technology required to reach IT operational excellence.

    The importance of non-technical elements in a successful business continuity and availability solution cannot be stressed enough. The traditional view of business continuity ignores the contributions IT best-practices deliver achieving service Levels (especially, availability and performance).

  • Yet today’s savvy IT practitioners are showing increased interest in leveraging the best-practices frameworks provided by IT Infrastructure Library (ITIL), and IT Service Management (ITSM) elements in particular, which help IT organizations achieve excellence in Service Delivery and Service Support to ensure their business continuity and availability solution adequately supports the business’ needs, even as they change rapidly.
    Four Steps to Better Risk Management

    Step 3: Design and implement solutions. This is the step where the rubber the meets the road. The design of the solution should be guided by findings from the first two steps and encompass the entire IT environment, including storage, databases, applications, systems and networks.

    Evaluate hardware, software and services needs in terms of established priorities to create an incremental implementation plan that addresses the most critical needs first. Include a continual service improvement plan.

    Step 4: Monitor, manage and evolve. As the solution is implemented, ongoing monitoring and management is essential to maintaining a solution that is meeting the business’ needs.

    Software management tools can aid in this process, but it is also critical to establish IT service management policies and training that aligns people and processes with best practices. This is where solid ITIL/ITSM practices can provide enormous value.

    For example, a change management board can be used to trigger episodic or regular reviews and tests of continuity plans. Finally, as the business evolves, it’s important to reassess strategy, plans and solutions to ensure they are continuing to meet the business requirements and addressing new threats that may arise.

    Business continuity and availability planning, data center and IT infrastructure operations and the implementation of IT-supported business processes are typically three distinct and disconnected processes. The challenge to integrate these processes also represents an enormous opportunity.

    Business continuity and availability planning is typically focused on identifying and managing business risks, covering people, process and technology. This is the process that establishes recovery time objectives (RTO) and recovery point objectives (RPO), and tests and keeps plans up to date. It does not focus on IT (operational) risks.

    When processes become integrated, business continuity and availability planning can play a valuable role in defining requirements for new processes, aligning service level agreements (SLAs) with RTO/RPO, updating plans for new business processes and ensuring compliance.

    Similarly, with the traditional disconnected approach, the data center and IT infrastructure operations usually react to downtime and most IT managers complain they spend too much time in maintenance and too little time on improvements.

    An integrated, proactive approach will see data center and IT infrastructure operations connecting RTO/RPO to SLAs and adopting best-practices to help keep business continuity and availability plans up to date and reduce operational costs.

    The implementation of IT-supported business processes undergoes the same kind of transformation when looked at with an eye for creating a holistic solution. This is the process that typically defines SLA criteria for availability and performance.

    Usually, it lacks the institutional connection to integrate with and update business continuity and availability plans, or the connection to RTO/RPO and does not focus on IT operational practices and their impact on these processes. However, it can instead play a critical role in delivering on SLA and RTO/RPO requirements while reducing operational costs to support new business processes.

    The Resiliency Spectrum

    The majority of companies do have some sort of business continuity/disaster recovery plan and solution in place, and with good reason. In the May 2006 survey, 90% reported they have a disaster recovery plan currently in place and 65% reported they had experienced outages of an hour or more, with the average being 10 hours. At an average cost of $90,000 per hour of downtime, this translates to a loss of nearly $1M in costs per outage.

    The question is usually not whether a company needs a business continuity and availability solution, but whether what’s currently in place is sufficient. To get a quick indication, see how you measure up to the resiliency spectrum, which categorizes five different levels of resiliency and productivity:

    Fragile – In a fragile environment, unreliable IT reduces productivity. This type of environment is plagued by downtime issues and lacks measurement, security policies, continuity plans and key performance indicators (KPIs). It typically involves some incident management, backups and reactive reporting.

    Delicate – In this type of environment, when things don’t go as planned, IT reduces productivity. Characteristics of a delicate environment include off-site disaster recovery and backup, incident management, mitigation of downtime risk and defined security policies. This environment incorporates some IT process training, but largely relies on having the right people in the right place at the right time.

    Stable – A stable environment features mostly reliable IT that is mostly productivity neutral. Business needs and processes are understood and incorporated into the security and disaster recovery plans that are established. In addition, monitoring and reporting tools including post incident reports (PIRs), and change management policies and procedures are in place.

    Durable – In a durable environment, business can count on IT, as IT can scale-up if required. The solution is reliable, ITIL practices are mature and key IT services have been identified. SLAs and measurement are in place and security and disaster recover plans are regularly tested. This environment is further characterized by proactive measures such as planning for change and isolated process improvement activities.

    Resilient – The ideal environment is a resilient environment where business productivity is very high due to IT. IT is transformed into a business differentiator with best-in-class IT service management and cost structure. There are regular business continuity plan rehearsals and regular testing of security plans. There are SLAs established for all key IT services and ITIL/ITSM practices adhere to ISO20000 standards.

    Budget – How Much is Enough?

    One of the most challenging aspects of operational risk management is balancing risk against the cost of protecting the business. The key is taking a methodical approach that addresses each business process in terms of the level of availability needed, how much data the business can afford to lose, how long the business process can afford to be down and the level of protection that’s needed.

    Once the acceptable level of downtime is determined for each business process, goals for recovery time, data loss and security and availability can be set and used as the basis for developing a solution that requires an appropriate level of investment.

    The Benefits

    Adopting a holistic, proactive approach to business continuity and availability can be a complex process as it requires integration not just of technology, but of people and processes, too.

    But once the IT environment has been assessed and stabilizing actions taken, IT staff are more productive and have more time to focus on improvement projects that can benefit many areas of the business because they’re no longer consumed with putting out fires. Also, the availability of critical applications becomes more reliable and more automatic, creating greater trust in IT’s ability to deliver.

    Then, as the IT environment becomes optimized, IT performance against KPIs improves, resulting in improved output to business and greater processing capability, which improves staff productivity.

    For example, a manufacturing company that grapples with an unreliable supply chain that results in loss of business can optimize the environment through business continuity and availability solutions, reducing or eliminating operational risk until it becomes a differentiator.

    Finally, once the necessary resilience is achieved, IT is more valued as a competitive advantage. Benefits at this stage include improved business financial performance, corporate reputation and share price.

    Achieving operational excellence has a powerful effect on increasing customer loyalty and strengthening supplier relationships. In today’s world, the benefits of superior operational risk management don’t stop at what might be saved, but extend into what might be gained.

    John Bennett leads the worldwide Business Continuity & Availability (BC&A) solutions group for HP, which is focused on helping customers reduce operational risk and ensure continuous operation of critical business processes. Please enter your content here.