Testing is Key to Data Warehouse Success
But even wise IT managers, who follow the old Russian proverb, "trust, but verify," need to maintain their vigilance. There are pitfalls in the testing process, too.
Here are 10 challenges to dodge:
Leaving out the users. Business users are the reason the data warehouse exists. If users are not involved in testing the system, it ultimately will fail on some level. Why? Here are the usual reasons: lack of buy-in on the project, lack of knowledge on using the system and lack of appropriate customization to user needs and preferences.
Testing reports - not data. Testing the system's reports is not enough. The fundamental rule of data processing still applies: garbage in, garbage out.
Extract, transform and load (ETL) testing should be a priority, accounting for as much as 60% of testing.
Skipping the reconciliation of warehouse data with source system data. Business users will not "believe" what they see in the data warehouse until they feel comfortable that the information is accurate. Users will cling to their comfort zone, i.e. they will continue to rely on reports from legacy systems reports until they are convinced that the new data warehouse provides the same or better quality information.
Set aside time to develop reports that tie new data warehouse reports to legacy system reports. Every new business intelligence system brings an element of cultural change to an organization, and cultural change takes time.
Underestimating the testing workload. Through the use of ETL tools, a minimal amount of code may be written for the testing phase. But this can be very misleading in terms of gauging time.
The amount of code can't predict the number of data issues testing will uncover. Data issues are the bulk of the problems revealed by testing ETL processes, and the fixes can be time-consuming.
Failing to set proper expectations. While data validation tests can be very thorough, they typically cannot address every possible data issue that will occur.
Be sure testers have a grasp of the general level of data quality that will be in the data warehouse and be sure that expectation is communicated to business users well in advance of the testing phase. There are many business processes that have very low tolerances for error, e.g., financial services.
Know the target before shooting the arrow.
"Fast tracking." The initial load of a data warehouse is a very time-consuming process. Make sure there is a certain level of comfort with the quality of data being loaded into the warehouse before investing several days, in some cases weeks, executing ETL processes.
Surprises erode the credibility of the system and starting over is expensive in terms of time and resources.
Underestimating the ETL process. Make sure that the ETL load processing time fits within the existing batch window.
If the daily refresh process takes 25 hours to complete, testers may need to go back to the drawing board. With some careful thought and simulation of the test environment, estimating time is not as challenging as it may seem.
Choosing the wrong testers. The impulse to cut costs is strong, especially in the final stretch. But it is easy to trip at the finish line by delegating the testing responsibilities to resources with limited technology and business experience.
The people who designed the system, including user representatives, should continue to guide the testing. Making this mistake could guarantee failure.
Skipping proper sign-offs. The paper trail is a bit like the police force. Everyone avoids it until it is needed. Get sign-offs on all phases of testing or stop the project.
Using "made up" test data. Use subsets of production test data for all tests, especially system and user acceptance testing.
If real data isn't available for testing, delay the implementation of the data warehouse. It is that important.
Mark Robinson is a business intelligence practice manager at Greenbrier & Russel. During his more than 20 years working with business technology, Robinson has been a consultant for companies in the financial services, retail, manufacturing, healthcare, software and professional services sectors. He can be reached at firstname.lastname@example.org.