Tips on Disk-to-Disk Backup, Part V

By Jim McKinstry

(Back to article)

IT managers have been struggling with the constantly shrinking backup window for years. A common way to address this issue is to purchase bigger, faster tape libraries.

This solution becomes large and expensive very quickly and can be quite complex to manage. My last four columns discussed disk-to-disk backup solutions as a way of reducing backup windows. While a disk-to-disk device can make a larger impact than a tape-based alternative, it is still just more of the same.

Adding disk-to-disk is really just adding a really fast tape library. While this may be good enough for many environments, there is a better way; there is no need to suffer with clunky, multi-hour backup windows.

The Problem

A typical nightly backup of a server (such as a database server) looks something like this: At the scheduled time for the backup, the backup server executes a script on the database server which either shuts the database down or puts it in some sort of "hot backup" mode (seconds to minutes).

The backup server then backs up the database server's data to a tape device (usually hours). The database is restarted or taken out of hot backup mode (seconds to minutes).

The time needed to complete these three steps is called the backup window. Steps one and three take very little time. The majority of the backup, step two, can take many, many hours to complete.

This is the step that IT departments throw hardware at to fix; bigger, faster tape libraries in an attempt to minimize how long this step takes.


A logical step in decreasing the backup window is to consider backing up production data to disk. This allows backups to finish far faster than backing up to tape and more importantly, recovering data is dramatically faster.

Tapes for offsite storage can be created by copying the data to tape after the backups are complete. Virtual libraries, devices that have large pools of Serial ATA (SATA) disks and emulate tape libraries, are becoming more popular. They are so popular because they are very fast and require nothing special on the backup server to work; they integrate seamlessly into the existing backup environment.

Another option is to add additional licensing or software to the backup server and direct the backups to a pool of disks. Both solutions are excellent options to add to a backup environment, but while they will reduce step two, they will not eliminate it.


The only way to come close to eliminating step two is to implement some sort of snapshot technology.

A snapshot is a point-in-time copy of the original data. Snapshots can be considered a disk-to-disk backup solution because they are stored on disk. With a traditional backup, the database is in degraded mode (hot backup mode) or is shut down completely for long periods of time; frequently for several hours.

With snapshot technology, the database is only impacted for a few seconds. It usually takes longer for the database to shut down or enter hot-backup mode than it takes to perform the snapshot.

Snapshots are so powerful because a database can be snapped in seconds, the database returns to normal operation and then the data can be archived to tape at any time by simply mounting the snapshot directly to the backup server.

A backup that takes advantage of snapshots looks like this: At the scheduled time for the backup, the backup server executes a script on the database server which either shuts the database down or puts it in some sort of hot backup mode (seconds to minutes).

A snapshot is taken of the database data (seconds). The database is restarted or taken out of hot backup mode (seconds to minutes). The snapshot is backed up to tape at any time.

Step two has been virtually eliminated. Backing up the data to tape may still take hours to complete, but the database is neither in hot backup mode or shut down while it occurs.

There are, essentially, two types of snapshots:

Pointer-based. A pointer-based snapshot is not exact copies of the data but a set of pointers that point to the original data. As the original data is written to, the changed blocks are written to a reserve area (another section of disk used to hold changes) and the pointer moved to that block; this process is called "copy on first write."

Subsequent writes to the original data are not copied to the reserve area because the original data has already been moved.

One of the most attractive aspects of a pointer-based snapshot is that the snapshot reserve area needs just a fraction of the original disk space, since only the changed blocks are copied. Because pointer-based snapshots require such a small amount of space, they can be taken more frequently at a low cost.

Since pointer-based snapshots do read the source data (unless it has been changed; also called a dependant copy) there is the potential for performance degradation while the snapshot is being backed-up to tape. This degradation is much less than if the database was in a hot-backup mode and being backed up at the same time.

Clone-copy. A clone-copy provides a complete copy of the data. When a full-copy is initiated, all the original data is copied to another area of storage, which may take from a few minutes to hours.

Each full-copy snapshot (also called an independent copy) requires enough storage to hold an exact copy of the original data; there is a 100% capacity overhead per full-copy snapshot.

The attractive part of the full-copy snapshot is that once the copy is completed, since it resides on completely different disks, it can be used with no performance impact to the original data. This means that not only can this full copy be used for a backup, but it can also be used to mine data, test patches or upgrades, be used for development, etc. on a server separate from the production server.


Like many functions related to storage, there are three places where snapshots can be managed: the host, an appliance or a storage device.

Host-based snapshots have a variety of different issues. One major one is the amount of overhead that can be placed on the host.

Companies spend a lot of money for servers to run their databases and in order to keep them running efficiently, should not add functions that may impact performance. Another issue is having to use different software for each operating system. While some vendors may support multiple operating systems, they probably won't support all that you may use. Many host-based solutions don't support moving snapshots between hosts.

An appliance-based solution takes away operating system issues, but can be a major bottleneck in the IO path.

For example, it's almost impossible to push a high-end, mid-range storage device to its limit with two servers with the power that most of these appliance-based solutions have. If there are two storage arrays being managed by the appliances, then 50% or more of their performance will remain untapped.

Storage-based snapshots, where the snapshots are performed within the storage array, are the best solution to consider. There is no overhead on the hosts, snapshots will work for any host attached to the array and use a common interface. A storage array is optimized for IO processing and can perform snapshots far more efficiently than the other options.

Solving backup needs used to be relatively easy: use the size of the backup window, the amount of data to be backed up, and the speed of the backup drives to calculate how many drives were needed. Today, where even the smallest companies have applications that need to be up 7x24, it can be much harder to solve backup needs by throwing bigger libraries with more tape drives at the problem.

My last four columns covered different ways to implement a disk-to-disk solution to help increase the efficiencies of a backup solution (faster backups and faster recoveries). This column discussed how to virtually eliminate a backup window by using snapshots. Take an existing backup solution and add a disk-to-disk solution and disk-based snapshots and you've got the ultimate backup solution. Backup windows will drop from hours to seconds or minutes and recoveries will be dramatically faster than tape.

Jim McKinstry is senior systems engineer with the Engenio Storage Group of LSI Logic, an OEM of storage solutions for IBM, TeraData, Sun and others.