This is becoming increasingly difficult since most (95% according to IDC) newly created data is unstructured. Inside of corporations, transactional data while also growing rapidly as digital transactions take over from paper-based systems accounts for only 20% of data growth.
Historically, classification has been a manual process and mostly remains so today, said EMC Corp.'s Ken Steinhardt, CTO of Customer Operations. It is also a process most companies have not spent a lot of time on. Only about 10% of all information today is classified.
"A lot of information that is digitally created really doesnt need to be stored, but (the study) is sort of a wake up call for just how big the need is for getting better management, better controls and better classification of information truly will be for an organization going forward," said Steinhardt.
There are three main areas into which information falls: availability/recoverability, performance, and compliance. How important is it this information be available/recoverable if there is a catastrophe? How important is this information to the performance of the business or its operations? And how important is this information for compliance purposes.
Once these three things are decided, you are on your way. The next step however turning information into knowledge makes things a bit more difficult. To constrict information to early on can vacate it's value very quickly. As information flows through a company and is worked with and added to, only then does it truly become valuable.
But, that value, is itself a moving target, said ISIC's Bohn.
"And it has proven very difficult for automation to be moved to front-end of the information lifecycle, because your incentive at that point isn't to classify it, it's to actually allow information to be broadly disseminated so you take out of it the knowledge component before you then automate the process, you move off the components that are not knowledge related," he said.
For now, the best most companies can do is to implement policies that control the retention of data, especially unstructured data such as emails, which are becoming increasingly critical to classify because of their importance to the business and for litigation reasons. And to put in place classification efforts around all the data the company generates or comes into contact with.
There are tools to make the job easier, but it's not going to be an easy job.
"The fundamental starting point is, dare I say it, is the recognition of the fact the world of IT has focused too much on the tech side and not the information side, which is where IT needs to be pointed going forward," said EMC's Steinhardt.