The Linux Business Case: Clusters

Jan 24, 2005

Tom Quinn

Anyone following the high performance computing industry has probably noticed the growing popularity of Linux cluster systems. The appeal is easy to understand.

Linux clusters cost a fraction of traditional supercomputers, while providing very fast performance numbers. Many organizations from national labs to Hollywood special effects shops have reported performance increases that their previous systems couldn't touch.

With such widespread appeal and tangible benefits, the migration to cluster systems seems inevitable for many organizations.

However, before taking the leap to this new computing infrastructure, several items should be considered to ensure a Linux cluster is a good fit for your computing environment.

The Questions

So, why do you want a cluster? Does your current system run slowly and you need faster results? Is your current system at end-of-life and no longer supported by the manufacturer? Does your current "big iron" box simply cost too much money to operate and you need a way out from under its exorbitant maintenance fees?

If any of these questions strike a chord, a Linux cluster may be the right solution, but a thorough self-examination is needed to ensure a cluster is a right fit for your organization.

The Apps

What about applications? How can you tell if your application will work with a cluster? If you fit into one or more of the following, there is a strong likelihood a Linux cluster system will be beneficial for your organization:

Monte Carlo analysis or parametric execution. In simple terms, are you running a program many times using different sets of input data? A cluster may prove an ideal platform for this type of application.

For example, a brokerage house runs derivatives analyses for its traders on a Sun SMP box. Traders provide a set of 100 different input files and data points to run against a set of models. The analysis program itself takes a long time to run and performs intensive number crunching. The end result is a small data set that brokers look at to determine how to hedge their trades.

If your current systems bogs down under this load, a cluster could be used to run this application faster, provide the ability to perform more analyses resulting in better trades, minimal risk, and more money.

Applications that are already parallelized or easily parallelized. Some applications are inherently parallel in the way they function. Others can be made to work in parallel very simply with minimal intrusion or algorithmic development.

A traditional example of such an application is image rendering. Image rendering typically refers to taking an image frame and running various image processing algorithms against the image data to modify it in some way, adding depth, changing lighting, etc.

This problem is easily parallelized by dividing an image up into a grid of blocks. Each data block can be sent to a node of a cluster to be rendered in parallel. After being rendered, the blocks are gathered from the cluster and re-assembled into a frame.

Now it is very possible that you are already running parallel apps on an SMP or vector-based parallel processing machine. If so, you probably know enough about your application to determine whether or not it will benefit from a migration to clustering.

The primary issue you need to consider will be the frequency and throughput required between your application processes and what performance you can expect from a cluster interconnect.

If your application requires a great deal of inter-process communication, relying on the fast memory bandwidth of an SMP, a cluster may not be the ideal solution given the limitation of the interconnect between nodes.

In particular, if you need to run many small processes that work together tightly, referred to as "tightly coupled," you should carefully weigh the resultant performance of running on a cluster.

However, most applications, if well designed, are structured to minimize inter-process communications and try to use local memory as much as possible, which run very well on Linux clusters.

An ideal case for migrating to a cluster platform would be if your application is already programmed in a standard parallel API such as message passing interface (MPI) or parallel virtual machine (PVM) and requires minimal inter-process communication. In these situations, a cluster should prove both easy to migrate to and provide performance gains that would justify the migration effort.

Ability to modify the application. Getting the application to run on a cluster will, in almost all cases, require modification and/or re-compiling.

However, commercial and free parallel cluster applications that can be run "out-of-the-box" without modification are becoming more common, but most applications being run on clusters require in-house designs.

If you are not sure, chances are you are going to need the source code to your application and modify it. Professional services provided by some cluster vendors, as well as application providers, are available to help modify source code.

If you comply with the above categories, then you are probably a strong candidate for clustering.

Other Considerations

So, you think your application may be a strong fit? But just like a recruit for the army, "Do you have what it takes?" Again, you want to make sure the proper steps are taken to ensure your cluster maximizes your ROI.

First off, do you have personnel with the right skill sets to make the cluster work? You will need the following: an understanding of the application and the ability to modify it, general Linux system administration skills, an understanding of networking and network topologies, and eventually, a familiarity with parallel computing and clustering.

What do you do if you don't have a clustering engineer at your disposal or your current sys-admin only knows Solaris?

Well, don't worry. There are many commercially available training courses and professional services available specifically designed to help round out the skills necessary to make your cluster work.

Look for Part 2 of this discussion on Monday of next week.

Tom Quinn is director of Government Business Development at Linux Networx, a provider of Linux-based cluster computing systems.


0 Comments (click to add your comment)
Comment and Contribute

Your comment has been submitted and is pending approval.



 (click to add your comment)

Comment and Contribute

Your name/nickname

Your email


(Maximum characters: 1200). You have characters left.