The Next Wave of Information Overload is Here

By Allen Bernard

(Back to article)

According to a first-ever IDC/EMC Corp. survey released in early March, the information explosion we've all become so accustom to is about to take off so fast new mathematical terms (think exabyte, which is equal to a billion gigabytes) will have to enter the popular lexicon to account for it all.

During 2006, the "digital universe," i.e., all the bits and bytes created by everyone, everywhere, was 161 billion gigabytes or 161 exabytes. Over the next four years, IDC predicts that number to grow six-fold every year for a 57% CAGR.

To get an idea of just how much information is flowing around globe right now, 161 exabytes is "approximately three million times the information in all the books ever written," according to IDC.

"I don't think the number in of itself is really a surprise, said Steve Minton, VP of Worldwide IT Markets and Strategies within IDC's Global Research Organization and co-author of the report.

"In a sense, this is really just putting a number on what people would have told you before. The bigger discussion is really around all the implications around this and what it means and what they need to do about dealing with all this data."

And that's the rub. While most information is being created by individuals, some 70%, 85% of that information is being touched by organizations, corporations and government bodies in some form or another. This then raises a whole host of questions over responsibility for the information and, therefore, how to deal with it.

"If it touches the organizations network anywhere then they can be held liable from a compliance point of view, from a legal standpoint," said Minton. "If that information is crossing their network … then their responsible for it."

The Growth of ILM & BI

In an attempt to get a handle on all this data, corporations have turned to information lifecycle management (ILM) concepts and tools, and business intelligence (BI) suites. Which accounts for those two market segment's rapid growth over the past couple of years, said Minton.

But these two technologies are still, relatively speaking, in their infancies, said James Short, research director at the Information Storage Industry Center (ISIC) of the University of California, San Diego.

"ILM is the first step" towards a solution, said Short. "It's essentially a collection of a series of hardware and software innovations in storage that allow for a more economic utilization of both the hardware and storage medium as well as a first phase attempt at defining policies that will allow you a more policy-based storage approach in your IT organization."

Also, since the biggest problem with so much data isn't necessarily storing it or the cost of storing it — Moore's Law has seen to that — it's deciding what information is valuable and what it not, said Roger Bohn, ISIC's director. The volume only exacerbates this problem.

"Somewhere along the way … classification of the value of that information has to be reconciled before the IT organization can make an economic system for storing that information," said Bohn.

Tech Solutions

While the vendor community works feverishly to develop the needed automation to tackle such a vast problem, their attempts so far have only begun to scratch the surface, said Short.

Basically, today's solutions work on the back third of the ILM process, once information is classified. What's needed is automation that works on the front-end to automate the classification of data as valuable to the business, required for compliance, and trash.

This is becoming increasingly difficult since most (95% according to IDC) newly created data is unstructured. Inside of corporations, transactional data — while also growing rapidly as digital transactions take over from paper-based systems — accounts for only 20% of data growth.

Historically, classification has been a manual process and mostly remains so today, said EMC Corp.'s Ken Steinhardt, CTO of Customer Operations. It is also a process most companies have not spent a lot of time on. Only about 10% of all information today is classified.

As the avalanche continues unabated and new legal (newly enacted changes to the Federal Rules of Civil Procedure regarding eDiscovery) and regulatory obligations (think Sarbanes-Oxley) regarding that information increase. This does not bode well for many companies.

"A lot of information that is digitally created really doesn’t need to be stored, but (the study) is sort of a wake up call for just how big the need is for getting better management, better controls and better classification of information truly will be for an organization going forward," said Steinhardt.

There are three main areas into which information falls: availability/recoverability, performance, and compliance. How important is it this information be available/recoverable if there is a catastrophe? How important is this information to the performance of the business or its operations? And how important is this information for compliance purposes.

Once these three things are decided, you are on your way. The next step however — turning information into knowledge — makes things a bit more difficult. To constrict information to early on can vacate it's value very quickly. As information flows through a company and is worked with and added to, only then does it truly become valuable.

But, that value, is itself a moving target, said ISIC's Bohn.

"And it has proven very difficult for automation to be moved to front-end of the information lifecycle, because your incentive at that point isn't to classify it, it's to actually allow information to be broadly disseminated … so you take out of it the knowledge component before you then automate the process, you move off the components that are not knowledge related," he said.

For now, the best most companies can do is to implement policies that control the retention of data, especially unstructured data such as emails, which are becoming increasingly critical to classify because of their importance to the business and for litigation reasons. And to put in place classification efforts around all the data the company generates or comes into contact with.

There are tools to make the job easier, but it's not going to be an easy job.

"The fundamental starting point is, dare I say it, is the recognition of the fact … the world of IT has focused too much on the tech side and not the information side, which is where IT needs to be pointed going forward," said EMC's Steinhardt.