Data Warehousing Appliances Reaching Maturity

By Majid Abai

(Back to article)

Years ago, I provided a lecture at a seminar in the topic of grids and databases. At that time, massive parallel processing (MPP) in database management and data warehousing were limited and we were trying to preach its benefits—although unsuccessfully—to several IT executives.

Looking back, the hardware and associated software were not mature enough to support the needs of majority of organizations. Fast forward to today’s data warehousing market and we are seeing it full of products that are basically performing that task, but in a much mature way.

Today, we have a name for this class of products. They are called data warehouse appliances (DWA). A DWA is a hardware-software bundle that handles massive parallel processing associated with a large-scale data warehouse. These products are designed to take advantage the processing powers of a large number of hardware nodes linked on a grid and to maximize the capabilities of the associated database management system to create ultra-efficient load and search capabilities. In short, terabytes of data can be loaded or searched in relatively short time.

The concept is relatively simple: One of the nodes acts as the distributor or administrator node, as a SQL statement is issued from the invoking application, the distributor breaks it down into several physical sub-queries (based on the number of nodes on the system and the physical distribution of data across the nodes) and distributes the sub-queries across all nodes.

These nodes process the call in parallel and return the results to the distributor node. The distributor node will then collate the results, perform the final sort (if needed), and return the results back to the caller application.

MPP systems have been around for a while and have been working successfully. However, for the most part, MPP systems have been expensive to implement and only a select team of technologists have been able to maximize their efficiencies. In the meanwhile, DWA systems have reached a maturity level at which they can now provide several key differentiation factors from their predecessors. These key differentiators are:

Low TCO : Compared to the previous generation of MPP systems, DWAs are cheap. Their cost is a fraction of the traditional MPP applications. This approach could enable an organization with low-budget and a great need to process large amounts of data to get in the game and truly show the power of a well-designed and run data warehouse.

The reason for their low cost is that a) a number of these appliances are capable of utilizing commodity hardware and as such allow the client to decide on their favorite hardware supplier/operating system, and b) a number of the systems are using open source database management systems and as such removing the need for paying hefty DBMS licenses. All in all, the total cost of ownership is much lower than the alternatives.

Scalability : For my money, this is the most important differentiator for DWA systems. An organization could start building at a small scale (to prove the worthiness of a data warehouse) with five-to-10 nodes and add new nodes and extra storage as requirements and budgets grow.

Black Box : As in most of MPPs, DWAs remove the need for the IT department to buy the hardware, buy the database management system separately, install the DBMS, and utilize the time of a qualified database administrator to maximize the performance of the system across all nodes.

DWAs are called appliances because they are packaged as one. The IT organization should not have to worry about anything except the physical design of the database and its most efficient implementation across the designated nodes. As such, DWAs should not be considered a hardware solution. They are truly a hybrid appliance.

Massive Data : DWA systems are designed to handle multi-terabytes of data easily. As such, if you have large amounts of data and not as much dollars, this might be the right option for you.

Flexibility : You’d like to build an enterprise warehouse? Fine. You already have an enterprise DW and want to build small data marts? Fine. You have no DW solutions and want try your hand in building new ones? Fine. Either way, it is not going to cost you much.

Real-Time Data Warehousing : DWA also allow for supporting the current trend of real-time and near real-time data warehousing. Due to their price structure, scalability, and flexibility, data warehouse appliances provide the tools to support operational applications easily and quickly.

Even though these benefits are obvious, there are, however, always two sides to every story. As much as I love the concept and think that there are great gems in the market, this is not a silver-bullet response to all needs. The main drawback, of course, is the physical and power capacity of the organization’s data center to store and support the nodes. Some organizations are running out of room and/or energy in their data centers and providing the space, energy, and cooling to outfit 40, 60 or more nodes could be a major problem.

As always, we’d need to carefully analyze the short- and long-term needs of our organization, and decide on the right set of tools that would support both.

If you decide on implementing a data warehouse appliance solution, I strongly recommend researching several products in the market and performing your due diligence on each. After narrowing the search down to a few finalists, establish real-life proofs-of-concept for each. It is only then that an organization will be able to select the true DWA product that would be able to support them as a long term partner.

Formerly the CEO of Seena Technologies, Majid Abai is now the EVP and director of Information Management and SOA practices for the technology consultancy Crescent Enterprise Solutions, which bought Seena in August. During the past 24 years, he has focused on providing enterprise IT & data strategies as well as implementation of major business intelligence (BI), knowledge management (KM), master data management (MDM), and data integration (DI) solutions to Fortune 2000 organizations.