Yet Another Business Case For Proactive IT Capacity Planning

By Marcia Gulesian

(Back to article)

Aligning IT and business goals is a widely-understood mandate for today's CIO. Toward that end, not just maintaining but increasing business process efficiency is de rigueur. To achieve this efficiency, the assistance of ERP, CRM and like packages from vendors such as SAP, PeopleSoft, and Microsoft are pieced together with an array of tools and technologies with names like iViews, Web Parts, and so on.

The idea is, in large part, to save senior decision makers and/or their subordinates time by automating repetitive tasks that used to involve many manual operations supplemented by face-to-face meetings. So, anything that renders these new business process systems unavailable runs counter to attaining the efficiencies they're meant to achieve.

If you're lucky, your time-saving system might be unavailable (e.g., experiences a Web portal bottleneck resulting from too many end users simultaneously asking for its services or a printer that's simply out of paper) for no more than a few seconds each day while felled seriously by a real disaster (e.g., be hit by a flood, fire, or earthquake) for no more than a day or so once every decade or two.

The shorter, sometimes daily kind of interruption is seen as an inevitable annoyance but not a material threat to the core business. As often as not, these brief interruptions are due to inadequate investment in capacity planning and are only remedied from time to time, when upgrades to hardware or software are funded.

In contrast, business-continuity systems (that run in parallel with your business process systems) are maintained continuously to protect against the down time that a disaster could deliver. The budgeted investment in these costly business continuity systems are often justified with the help of calculations such as:

  • Availability = 100% x reached uptime / planned uptime


  • Reliability = 100% x MTBF / (MTBF + MTTR)

    [MTBF = Mean Time Between Failures] [MTTR = Mean Time To Recover]

    When used in the normal budgeting process, these calculations rely on estimates of the substantial interruptions that could be caused by a rarely occurring disaster.

    However, the more common loss that most users experience lasts for only very brief period of time. So, in practice, the formulas are seldom applied to the aggregate of these brief interruptions over the same period of time, typically a decade or more, as that between disasters.

    Lets say, for purposes of discussion, that these frequent inconveniences occur, on average, for only 20 seconds a day; a number that's a good deal smaller than is warranted by my personal experience. This means something like 50,000 seconds (or 14 hours) over one decade and 100,000 seconds (or 28 hours) over two decades.

  • Loosing only 20 seconds a day may seem inconsequential to senior management when compared to the potential down time caused by a real disaster. But, that's not necessarily the case when you think back to why they originally invested enormous sums of money in ERP, CRM and like systems.

    Virtually all of us experience these minor interruptions, while few of us will experience the substantial loss of a business process system following a full-blown disaster. Nonetheless, when the formulas stated above are applied to both of these situations in an even-handed way, the result is roughly equivalent amounts of system down time in both situations after the formulas are applied to the same ten to twenty year interval. This gives one yet another reason to focus on capacity planning.

    Paradoxically, this observation is not usually reflected in the budgeting process. There a great deal of money is spent up front to assure high system availability and guaranteed business continuity should disaster strike. At the same time, incremental improvements in the capacity of the non-disaster-recovery parts of your systems are usually not driven with the same sense of urgency.

    Now, I don't mean to suggest that these daily "slings and arrows" are the same as the rare disasters that some of us are "heir" to. However, proactively upgrading to a more robust server, simply replacing the paper tray on a printer with one having a larger capacity, and so on can sometimes prevent as much downtime over a 20-year interval as can expensive business continuity systems.

    You can't avoid the costs attendant to insuring your organization against the threat of a disaster. But, by attending to capacity planning sooner rather than later, you can reduce your overall down time at little or no additional cost. And, in so doing, exploit your business process systems in full.

    Finally, one could argue that employees spend a lot more time in friendly conversation standing around the proverbial water cooler than they do waiting for their business applications to come back on line each day. But the former, done in moderation, is likely to increase productivity while the latter usually has the opposite affect.

    Marcia Gulesian has served as software developer, project manager, CTO, and CIO. She is author of well more than 100 feature articles on IT, its economics and its management, many of which appear on CIO Update.