Planning Data Protection Into Your Virtual Infrastructure
To begin, it is important to understand the things you need to do to plan a virtual infrastructure and choose the appropriate data protection for it. Identifying and selecting the capabilities and limitations of data protection within your virtual infrastructure is one of the most critical tasks.
For simplification, this article limits the virtualization platform example to VMware ESX. The process is the same for Microsoft Hyper-V, Virtual Iron and others until you get to the end and have to determine the right implementation.
What applications should I virtualize?
With current virtualization technology almost all applications can be virtualized. You just have to decide on a reasonable set of applications and then compile the following information:
It is absolutely critical that you characterize these applications under their heaviest expected load or you'll start running out of resources unexpectedly when you implement your virtual infrastructure.
Total memory footprint
Memory the application uses at peak load? If the application "leaks" memory (its memory footprint grows even under constant load) you'll need to allow room for that as well.
Total CPU utilization
How many CPUs and at what percentage used at peak load? Don't forget to note the type of CPU you used when you did your measurements.
Total disk space including growth to next budget cycle
Network bandwidth utilization
Network bandwidth used by this application at peak load. Remember to account for both directions of network traffic.
Storage network throughput (SCSI, FC, iSCSI, NAS) as both input and output
The same thing you just did for your messaging network.
Disk reads and writes
The disk activity that this application requires at load. There are other disk load parameters that may need to be characterized as well, depending on the application.
Memory bus utilization estimate (memory bus available bandwidth minus four times the total I/O)
Years of empirical data have upheld this useful rule of thumb. This can be somewhat difficult to get since it is not always easy to identify the memory bus speed of a particular system.
Is there a window during the day or night when they could reasonably be shut down and backed up?
Is there a window during the day or night when the total load on the ESX physical server is low enough that backups can be performed without negatively impacting the running apps? If there is no application and ESX server available window, you will need to select a proxy backup method.
Do you need to be able to recover individual files on a regular basis? If so, you will most likely need to run a backup agent directly within a virtual machine.
If you've designed and implemented a few data protection architectures, the requirements gathering process was probably quite familiar to you. It doesn't change much for virtual infrastructures.
Once you understand your application and data protection requirements there are some simple decisions to make:
Agents in each virtual machine
This is the simplest decision, since it mirrors what you are already doing with your physical infrastructure. The strengths of this approach:
There are two significant weaknesses to this approach:
Agent in Hypervisor Service Console
This is pretty simple as well. It only requires a single Red Hat Linux agent for each ESX server.
1) VMware Consolidated Backup (VCB)
VCB gives you the ability to use a Windows proxy host to backup Windows virtual machines.
Almost entirely eliminates load on virtual machines and ESX server during backup
Enables hot virtual machine backup
Lack of non-Windows platform support
Some recovery limitations
VCB license cost
2) Storage server snapshots
This approach is quite simple to manage once it is implemented if you have storage that provides the functionality. You can connect another host to the storage to manage the snapshots for backup and recovery.
Low application server and ESX server overhead
Cost of snapshot enabled storage
Complexity of initial deployment (varies widely depending on implementation)
What does implementing the right protection solution in a virtual environment do for you?
With virtualization you can do things like physical machine to virtual machine conversion and, in some cases, you can take advantage of your existing backup images to migrate to a virtual infrastructure.
If you plan your data protection, you will never have to do a bare metal disaster recovery again since virtual storage file systems are simple, single files. Recovering an entire system can be as simple as recovering a single file.
Site disaster recovery can be greatly simplified since you can bring a site up quickly on lower end physical systems and add capabilities as needed without interrupting operations. You will still need to develop a site disaster recovery plan, but there are many available resources to help you to do so. Clustering virtual machines with VMware Virtual Infrastructure is much easier and less expensive than with physical clusters.
Virtual appliances can make purchasing, installing, configuring and updating applications much simpler. In some cases they can also help simplify site disaster recovery.
Brian Gardner serves as vice president of Product & Technology Management at Yosemite Technologies. He joined Yosemite from EMC, where he served in the CTO office for Information Management. Gardner graduated from Memorex's Advanced Development Center of Excellence program in partnership with Lehigh University with an equivalent to a post graduate Chem E., specializing in magnetic coatings.