Don't Patch and Pray
It could be argued that the ease with which patches can be distributed has fostered an environment of features, as opposed to an emphasis on mature development practices that inherently promoted stability and security. With today's constant stream of patches coming from so many sources, however, and the tandem needs for security and stability applying pressure, organizations can no longer afford to patch and pray.
Why Are There So Many Patches?
Despite all of the advances over the years, software development is still immature. There are dozens of well thought out methodologies to assist in the defining of requirements, module interactions, code re-use, testing, etc. But let's think about this for a moment: which parts of a computer have code in them? Let's make a quick list: the CPU, BIOS, storage system, graphics card, network card, other hardware with on-board firmware, the operating system, device drivers, and security applications (including the anti-virus and personal firewalls), not to mention all of the in-house and third-party user applications. If you were to take a typical desktop PC, for example, the list of software/firmware written by various groups can quickly number in the hundreds and yet, they must all co-exist and many of the applications must work together in varying degrees.
The point is this: different teams using different personal styles, methodologies, tools, and assumptions generate all of this code, often with little to no interaction. When you combine the various pieces of software (i.e., all compiled or interpreted code, be it embedded in firmware or run in an OS environment), the results aren't always readily predictable due to the tremendous number of independent variables. As a result, issues arise; and when development groups attempt to fix the issues, they generate software patches with all of the best intentions.
If we return to our basic principle — as software becomes increasingly complex, the number of errors in the code will rise as well — this also means that potential errors with the patches themselves will correspondingly rise as well. Furthermore, patches often contain third-party code, or ancillary libraries that are not directly designed, coded, compiled, and tested by the development team in question. Simply put, there are many variables introduced with patches.
To be explicit, for the purpose of this article, patches are defined as a focused subset of code that is released in a targeted manner as opposed to the release of an entire application through a major or minor version code drop. The patch may fix a bug, improve security, or even update from one version of the application to another in order to address issues and provide new features. These days, of course, the security patch issues really get a lion's share of media attention, but correcting security isn't the only reason patches are released.
Regardless of the intent of a patch, the problem is that the introduction of a patch into an existing system introduces unknown variables that can adversely affect the very systems that the patches were, in good faith, supposed to help. Organizations that apply patches in an ad hoc (i.e., little or no planning taking place prior to development) manner are known to "patch and pray." This slang reflects that when patches are applied IT must hope for the best.
Interestingly, in reaction to the often-unknown impact of patching, there appears to be one school of thought wherein all patches should be applied and another that argues that patches should never be applied. It is unrealistic to view the application of patches as a bipolar issue. What groups need to focus on is the managed introduction of patches to production systems based on sound risk analysis.
It's All About Risk Management
In a perfect world, everyone would have the exact same hardware and software. This way, any new patch would perfectly install without issues. However, this perfect view is nearly impossible to attain on a macro/global scale, but does serve as an interesting thought experiment. The fact is that organizations will almost always have different environments than their vendors, peers, competitors and so on. Thus, any patch applied to existing systems carries a degree of risk.
Likewise, there are risks associated with not patching. What organizations need to do is assess the level of risk of each patch, define mitigation strategies to manage the identified risks, and formally decide whether or not the risk is acceptable. To put this in the proper context, let's define a basic process for patching because risk management is a pervasive concern though the whole process, but risk management by itself does not define a process.
A Basic Software Patch Process
The patching process does not need to be complicated, but it must be effective for the organization and its adoption must be formalized. Furthermore, it is absolutely critical that people be made aware that the process is mandatory. The intent is to codify a process that manages risk while allowing systems to evolve. By creating a standard process that everyone follows, best practices can also be developed over time and the process refined. With all of this in mind, here is a simple high-level process that organizations can use as a starting point in discussions over their own patch management process:
There must be active mechanisms that alert administrators that new patches exist. These methods can range from monitoring e-mails from vendors, talking to support groups, all the way to using automated tools, such as the Microsoft Baseline Security Analyzer, to actively scan systems for missing patches. These patches must be identified and added to a list of potential patches for each system.
Depending on the volume of patches, it may help to sort patches on the basis of priority and identify whether the patches are to proceed, be placed on hold, or cancelled. Think of it as a form of triage because IT resources are always limited and decisions must be made early on. This does assume, however, that the people making the decisions will be adequately informed about the risks involved.
Patches should be reviewed by system, priority, and category, and also grouped. As opposed to installing each patch as it comes in, organizations need to strongly consider having a policy of grouping patches and deploying them periodically in batches following a solid testing process. For example, one step would be to only apply patches on a bi-weekly schedule. This grouping and delayed application approach need not apply to all situations.
In the case of high-priority patches, wherein the risks associated demand immediate patching, then there must exist means to handle emergency exceptions in an accelerated fashion while maintaining effective controls. In other words, yes, hot patches will come in and demand immediate installation. However, rather than bypass all review and testing steps, there still needs to be a means to review the hot patches and make informed decision about their expedient installation.
Part of the planning process should also define how the appropriate stakeholders would be notified about an upcoming series of patches. The communication plan should outline how the stakeholders will be updated of issues, progress, and completion, as well as any post-implementation reviews. The degree of communication depends on what the patch is, the level of risk, and the stakeholders in question.
3. Initial Testing
Ideally, all patches will be reviewed on segregated test systems that mirror the production environment as closely as possible. The intent, of course, is to test and discover problems prior to going into production. This allows time for issues to be investigated. Again, and it is an ideal, production systems would never be patched directly. However, as mentioned earlier, there are situations, such as Code Red, Nimda and MSBlaster, wherein the security risks are so high, that production systems may need to be patched directly. To reiterate, the risks must be identified and reviewed in order for an informed decision to be made.
Note that testing should not be ad hoc. In other words, testing of each system should follow a formal test plan that outlines the main applications, functionality, test process and expected results if the applications are performing as planned. Yes, this does take a while. However, if stable systems are desired, it is time well spent. If a flawed patch is erroneously approved, installed, and causes production systems to fail, the costs can skyrocket. A decision to bypass testing, or have poor testing, is a gamble that can have disastrous results.
The approval step must be formal. The intent is to take the list of patches, the implementation plan, test results, and present them to a governing body to gain approval to install. The governing body should have the technical knowledge to make an informed decision about the risks and adequacy of the planning.
Even emergency patches must have a defined fast-track process that still requires approval to proceed. Never underestimate the value of review to catch potential issues.
Part of the planning step should be a deployment plan. It may prove beneficial to roll a patch out in phases starting with the least critical systems to see if there are unknown issues that unexpectedly appear in the production environment. In terms of actually installing the patches, there are manual methods and increasingly often, automated update tools that can be used to expedite the installation process. The key here is that installation should follow an approved plan. The actual installation of the patches in production is a relatively small part of the overall patching process.
6. Post Deployment Testing
The military has a saying that few plans survive contact with the enemy. In the context of patches, we must be sure that the deployed patches do not break the production systems. At this point, failures could result due to the patches, due to issues with the deployment system, due to keyboarding error, etc. Regardless of why, it is important that there be previous coordination with stakeholders to quickly assess systems to ensure that they are still operating as planned.
Once the patches have been deployed, there should be long-term automated monitoring in place to detect anomalies. Again, because so many variables are in play, even involved test plans may fail to identify a combination of events and values that causes a system to fail. Part of the patching process should be a review of any impacts to the monitoring systems. It may be that patches necessitate changes to production monitoring in order for it to continue to be effective.
In the end, a patch process takes time and effort. As a result, some personnel may elect to attempt bypassing the process for one reason or another. To be successful, the process cannot be partially followed. Everyone must follow the formal patch process.
As a side note, there are automated configuration integrity systems, such as Ecora and Tripwire, which should be used to detect changes. Detected changes must tie out to approved change orders and any others identified as unauthorized changes. All unauthorized changes must be investigated as to why they happened and corrective action taken to prevent them from happening again. Bear in mind a simple auditing tenet — there is no such thing as an immaterial control violation. If a control is bypassed, then a weakness exists and the next breech could be far worse if left uncorrected.
Software is complicated and there will continue to be issues that necessitate patching. As a result, organizations must develop processes that assess the risks associated with patching and make determinations about what to do and what not to do. Organizations can no longer afford to have a "patch and pray" mentality. Instead, they must view patching as a formal process that is going to be around for a long time.