Does Your Data Warehouse Need a Fix?

By Esther Soloveichik

(Back to article)

Your first data warehouse has been up and running for two years. You've developed many reports, aggregated pertinent data, and your users are content.

Or so you think.

Now there's a forecast system that's going to be built. It will use the transaction data in your data warehouse for its baseline, and visions of happy prognosticators dance across your mind.

The forecast team takes a look at your data and declares that it's unfit for conjecture!

"You are not partitioning history correctly," intones the lead architect. "You are not conforming business descriptions across the organization. Your hierarchies are rigid, and your dimensions are snow-flaked."

Suddenly, you have a big headache.

"I've got 150 reports that are being used weekly," you counter. "It can't be that bad."

"For us to get what we need," croons the architect, "you would have to redesign your data warehouse, and we can't afford to wait."

"Here's what we'll do," he adds mournfully. "We'll take the data we need from your warehouse, add to it, and load it into ours.

"Good," you say. "I don't have to change a thing."

This scenario is not unusual. For most organizations, the first data warehouse initiative is a laborious expedition into the unknown. So much effort has gone into analyzing, extracting, transforming and loading the data, that a huge sense of accomplishment is felt when the data lands in the hands of some users.

Once users get hold of this data, they clamor for more attributes, more reports and more ways to slice and dice the information. The rush is on to deliver without upsetting what's already there.

Before long, there may be several data warehouses that use a core set of the same data, and you cannot ask a question that will easily navigate across all of them.

To prevent this hydra-like growth (or an early death), take the time to give your data warehouse a thorough checkup. This assessment is crucial. It will either validate your strategy for handling the escalating needs of your core and expanding constituencies, or it will point you in the right direction for corrective action.

The key areas to examine are:

  • Data warehouse governance
  • Data warehouse usage
  • Data model composition
  • Technical architecture and infrastructure
  • Data Warehouse Governance

    The goal of a data warehouse is to integrate and conform the data from across the organization.

    To conform means to bring into agreement, or to make similar. Conforming data entails having business units agree on terms that describe the important entities in the business. For example, a bank that originates and services mortgage loans, and then sells them on the secondary market will often have a discrete application for each function in the mortgage lending lifecycle.

    Each application has its own set of codes and descriptions for the property and for the loan product. One application may call the property a town home, while another may call it a single-family attached dwelling. In order to evaluate or segment loans by type of property and product, a core set of descriptions must be agreed to by the people in origination, servicing and capital markets.

    In turn, the codes of the respective operational systems must be mapped to this agreed-to set of descriptions when the data moves into the data warehouse.

    In addition, the users of a data warehouse will often tell you that consistency means knowing exactly what they are getting from the data warehouse. If the users ask for revenue or cost totals, they want to know what the components of revenue and cost are, and they want those components to be the same for all queries.

    The first thing your checkup must do is determine whether your data warehouse strategy is committed to the philosophy of making data consistent. Then you must examine how well your governance process for maintaining and growing the data warehouse functions in light of this commitment:

  • What is the method by which new attributes, new dimensions and new facts are released to the user community?
  • How do you determine the business impact and feasibility of getting new data into the data warehouse?
  • Do you have people from various business units dedicated to specific dimensions in your data warehouse, e.g., customers, products, locations, promotions?
  • What is the procedure for maintaining the consistency of existing dimensions as new products and codes are added in respective source applications?
  • Simply put, conforming dimensions means bringing consistency to the textual attributes of an organization. These are the attributes by which users slice and dice information -- customer demographics, product categories, organizational hierarchies, time periods, and many more.

    Conforming facts means bringing consistency to the measures of an organization -- the number of customers, total revenue, gross profit return on inventory, end-of-month balances, and a host of others.

    Examine your commitment to and process for conforming data. Unless data is conformed across the organization, the full value of the data warehouse will never be achieved.

    Data Warehouse Usage

    The real purpose for a data warehouse is to enable the business users to make decisions. To find out how well the data warehouse is assisting the decision making process, ask the users.

    Since the data warehouse has been functioning for some time, tackle the user interviews from three directions: the intended users, the neglected potential users, and the entrepreneurs.

    The data warehouse was built with a set of users in mind, and it was designed to answer their questions. Ask the intended users who are currently using the data warehouse:

  • What works well and what doesn't?
  • What questions do you ask consistently and which decisions do they affect?
  • Do the answers provide adequate information?
  • Do you always understand the information you are getting from the data warehouse?
  • What are the most useful slice-and-dice capabilities?
  • What other slice-and-dice capacity would you add?
  • How do you rate the quality of the data in the data warehouse?
  • How would you improve the quality?
  • What questions would you like to ask?
  • How easy is it to formulate a query?
  • How quick is the response?
  • What forum would you like to see for adding new information?
  • Ask the intended users who are not using the data warehouse what prevents them from using it, and what would make them change their minds.

    Next, turn to acknowledged potential users of the data warehouse whose needs were not addressed in initial releases. These neglected information consumers will give you a better understanding of future wants: the key data elements to be migrated to the data warehouse, the data sources, the volume and frequency of migration, and the impact the information will have on making decisions.

    This will help shape the conforming committees and the governance process by which new information is released to current and new consumers.

    Finally, engage those select individuals who are using the data warehouse to do great things. Find out what hidden stories they unearth in the data. The entrepreneurs on the periphery are critical change agents, and they will help make your data warehouse a success.

    If there are no entrepreneurs to be found, consider whether everyone is asleep, or whether your data warehouse is too cumbersome to use, contains no useful information, or both.

    Data Model Composition

    Now we turn to the underpinnings of the data warehouse.

    The data model of a data warehouse should be the optimal model for delivering data to the end users. The model must be easy to understand and provide speedy query performance. Anything else is beside the point.

    Dimensional modeling is the best way to model decision support data, and a well-designed dimensional data model is the prerequisite to a high-performing data warehouse.

    The data model for your data warehouse should be checked by a data architect or data modeler versed in dimensional data modeling techniques.

    This assessment looks at: data mart alignment with business functions; fact table granularity; and dimensional structure.

    Data Mart Alignment with Business Functions: Each data mart or family of fact tables should focus on a business theme that's important to the business, possibly tracking that theme through various phases of the business lifecycle.

    For example, managers and analysts who worry about quality in the order fulfillment process would be interested in knowing about:

  • The quality of goods received into inventory, inventory turns and replenishment times.
  • Machine down time, throughput, scrap, and breakage during the manufacturing activities.
  • Picking, packing, and shipping execution times.
  • Customer complaints and returns.
  • The data modeler examines each family of fact tables or individual facts to ascertain how well they align with the business requirements.

    Fact Table Granularity: The cause of much user heartburn and complaints regarding data warehouse queries can often be laid to the granularity of the individual fact table record.

    The design of a data mart begins with a declaration of the grain. The grain might be an end-of-day snapshot of individual account balances or every parking ticket issued by a parking garage.

    If the fact table contains every segment of a telephone call as it is routed through the system, then each call segment is the grain. Every question tied to a business theme depends on the grain of the fact table.

    Very often, not enough consideration has been given to visualizing and declaring the correct grain. The data modeler studies the impact the granularity of each fact table has on user queries. Changing the granularity can have a substantial impact on the choice of attributes in both dimension and fact tables.

    Dimensional Structure: The data modeler examines the schema for its adherence to good dimensional modeling techniques and for its potential response to customer requirements.

    The following is a sampling of things considered:

  • Is the schema a de-normalized dimensional model or a normalized entity relationship diagram?
  • Do the schema and its metadata support a commitment to conformed dimensions and conformed facts?
  • Does each dimension attribute appear once, and only once?
  • Do the dimensions contain a good mix of hierarchical and nonhierarchical attributes that the business considers important?
  • Are the textual dimension attributes verbose and of high quality?
  • Are all dimension keys, with the exception of bridge tables, single-part surrogate keys?
  • How are slowly changing dimensions handled?
  • How are many-to-many dimensions handled?
  • What snow-flaked dimensions exist, and are they appropriate?
  • Are most facts additive?
  • What semi-additive and non-additive or 'text' facts exist, and how are they manipulated in the queries?
  • Are the existing transaction and snapshot schemas appropriate for responding to customer queries?
  • Are the aggregates suitable?
  • Technical Architecture and Infrastructure

    A checkup of your technical architecture concentrates on how you accomplish what you set out to do, while an examination of your infrastructure looks at where you are doing it.

    This part of the checkup may require the services of several technical specialists.

    Back Room Architecture: Assess your back room architecture by looking at how you extract, clean, conform and deliver data from the source systems to the presentation servers.

    Note the capabilities needed and scrutinize whether your standards, development methodology and products provide the desired results within the prescribed timeframes.

    As part of the examination of how you clean and conform the data, make sure to focus on data accuracy. A data warehouse that contains incorrect, confusing, or partially populated data will be abandoned.

    Front Room Architecture: To the end users, the query reports are the data warehouse. In talking with the users, you will have observed who needs the reports, how they are distributed, and what types of analysis is required.

    The report templates should contain the appropriate specifics to meet customer needs, the users must be sufficiently trained, the query tool ought to be easy to use, and performance must be fast. Match the needed capabilities with reality and see how close you come.

    Infrastructure: Specify what hardware, system-wide operating capabilities and storage capacities you need to grow your data warehouse and detail what currently exists. Determine the gap between wants and actuality, and identify who is responsible for narrowing the gap.

    Analyze Findings and Plan the Future

    Organize the checkup findings within the four areas of analysis. Separate the things done well from things done poorly. Determine what you should be doing more of within the items you are doing well.

    For the problem items, indicate their impact on the well-being of the data warehouse. The impact is measured in terms of what will happen if you do not fix the problem.

    You can quantify the impact by using high/medium/low rankings, 1-5 marks, or more colorful measures such as "fatal flaw," "fix within a year," "would be nice, but not crucial" and "do no harm."

    For fatal flaws and items requiring remediation before long, note what other areas will be affected and what the prerequisites are to affect a fix.

    Next, determine the complexity or feasibility of remediation. Assign a "low" feasibility rank or a grade of 1 (or even zero) for a complete rewrite, and assign a "high" feasibility rank or a grade of 10 where the remediation is an extension of a natural process, such as adding another report template.

    Taken together, the impact and feasibility rankings should point the way to your data warehouse therapy regimen.

    Look carefully at your high-impact items. These are the items that can disable your data warehouse. If you have fatal flaws, you will have to do some heavy thinking about your next steps. If the checkup basically validates your overall data warehouse strategy, then turn your attention to doing more great things and tuning your data warehouse so it can fly.

    In the end, remember, all data warehouse decisions are driven by user requirements. The user community evolves over time. The decisions they make and their information needs metamorphose in response to their customers, suppliers, competitors, and market events.

    Data is knowledge. Check up on your users and find out how well the data warehouse is evolving along with their needs and you will have provided them with a used and useful resource.

    Esther Soloveichik is with Intrasphere Technologies, a premier technology consulting firm with a core focus on Life Sciences. Intrasphere provides end-to-end technology services and has successfully implemented large-scale projects for some of the world's leading global companies, including Pfizer, Schering-Plough, Novartis and Eli Lilly, among others.