Improvement Starts with Your Darkest Hours

Mar 23, 2012

Jason Druebert

by Jason Druebert of AT&T Professional Services

When an IT folks look to improve they most often take the approach of “Let’s do a project to fix this list of items." The assumption of course being it will result in a big one-time improvement gain.

As a consultant this type of work is my bread and butter, but it isn’t an approach I usually recommend or that I adopted when I was in operations.

The approach I like to take is this: look at what you are doing, make incremental adjustments, hold people accountable, repeat.  In my experience, incremental improvements over time cost less and do a lot more good than one-time projects.

The best place to start this incremental approach is almost always with major incidents (MIs). They cost the business money, are highly visible, cut across IT, and by definition represent areas you need to improve. You may be saying “We do a root cause investigation after every MI." If so, that is a good start, but what I’m talking about is a comprehensive review that considers all aspects of a failure. Here is the basic process I like to start with:

1)    Executive leadership selects one or two incidents per month for review based on their importance.

2)    Everyone involved meets to discuss -- err on the side of inclusion.

3)    The meeting is run by a neutral facilitator who doesn’t have a dog in the fight.

4)    Detailed notes are used to create a report to executive leadership that contains recommendations and action items.

5)    Executive leadership holds people accountable for these action items.

6)    The output is vigorously communicated throughout the organization.

The key to these meetings is to keep them positive and focus on the process verses individual actions. Everyone makes mistakes. In organizations where people are afraid to make mistakes not much gets done. The goal is to eliminate mistakes that occur over and over so you can move on to making exciting new ones!

When you do find an individual made a mistake you are very likely to find a corresponding lack of training and management guidance. Focus on those rather short comings rather than the action of the individual (unless of course their actions are egregious, in which case they need to be handled outside of the meeting anyway).

Here are some good starter questions to guide your reviews:

  • How was the incident detected?
  • Were there symptoms before it was detected -- could we have detected it sooner?
  • What did we do once it was detected? Was it recognized and routed quickly enough?
  • What was the actual impact on the business? Did we understand that at the time? Do we understand it now?
  • How did we restore service?  Could it have been restored quicker?  Were there work-arounds we could have used or used better?
  • How well did the IT groups involved work together? Were there any issues related to communication or ticket transfers?
  • Were roles and responsibilities clear? Who was in charge?
  • How did we inform stakeholders throughout the incident lifecycle, could we have done a better job?  (Hint: If there are people in the meeting who didn’t know what was going on during the outage that is a good indication communication was not effective).
  • Do we know what caused the outage?  Can we prevent it from occurring again?
  • Was the incident ticket coded appropriately?  Same for any related change and problem records.

The role of executive leadership is critical in this process. They have to insist it occurs, that all parts of IT participate, it is fair and positive, and that people are accountable for action items. Also, the involvement of senior management signals the organization it is important activity.

Once the process is up and going you can add more incidents or, better yet, develop similar processes to review completed releases/changes and projects. Like the saying goes: Those that don’t learn from history are doomed to repeat it. So learn from your biggest failures and use them to drive improvement across your organization.

Jason Drubert is a consultant specializing in process improvement with AT&T Professional Services; where he also contributes to the AT&T Networking Exchange Blog.

Tags: IT management, CIO Leadership, AT&T Professional Services,

0 Comments (click to add your comment)
Comment and Contribute

Your comment has been submitted and is pending approval.



 (click to add your comment)

Comment and Contribute

Your name/nickname

Your email


(Maximum characters: 1200). You have characters left.