How to Extract Information from the Sea of Big Data – Part I

May 30, 2012

CIOUpdate Contributor

by Ian Rowlands, senior director of Product Management for ASG

A convergence of technologies is shifting the balance of attention in the information management world. From the beginning of business IT, which, tellingly, started out in many organizations as “data processing,” the focus has been on so-called “structured” data. Yet there has long been recognition that enterprises have far more “unstructured" data than structured, matched by a perception that mining it would cost more than the effort would justify.

The cloud, Big Data, semantic and metadata

The cloud, Big Data, semantic and metadata technologies are causing a rethink of feasibility, just as the value of customer interactions, click stream data and other sources are being reevaluated and the risks inherent in email, contracts, and other documents being reassessed. Put all of this together with the emergence of cool new consumer apps driving expectations, and suddenly the sea of unstructured data is full of sunken treasures.

Information overload. It’s not a new idea. In fact Alvin Toffler probably first put the notion into general parlance in his book “Future Shock,” published back in 1970. Even then, before the Internet, email, or social media, there was a perception that too much information was making decisions harder, not easier to make.

At around the same time, Peter Drucker was coining the term “knowledge worker” to identify the shift from manual and service work to knowledge work as a main driver of business value.

So here we are about 40 years after Drucker and Toffler identified the seminal social and business shifts implied by the information explosion and we are swimming in an ocean of information and data. In fact, according to well respected research firm IDC, the amount of information created and replicated in 2011 will have passed 1.8 zettabytes (a zettabyte is a trillion gigabytes) and it’s anticipated that the pile is going to keep doubling roughly every two years.

How are we going to cope, now? So far, it’s not looking good. According to another distinguished research house, Forrester, firms use only five percent of the data available to them, while created data is growing at 40 percent to 50 percent annually and only 25 percent to 30 percent of that total is being captured.

That of course is data, not information. And therein lays another ugly truth: Information technology started as data processing. Its business was dealing with a special kind of information, information that resided in fixed locations in defined data structures.

Over time, this increasingly has come to mean tabular data, with rows and columns, and keys to support analysis. Increasingly sophisticated, high-performing tools have been developed to capture, process and analyze this kind of information. The bad news is that the majority of information no longer fits that satisfyingly simple model. The majority (I’ve seen estimates as high as 80 percent to 90 percent!) is what is now lumped together as "unstructured data."

Unstructured data is a misnomer

Unstructured data is really a shorthand term which lumps together many different things. There’s no such thing as truly unstructured data (or information). No sharing is possible without some agreement as to structure. The unstructured moniker is really a hangover from when information was stored in ways that meant it was impossible (or extremely difficult) to get at it with data processing capabilities. It’s probably better now for us to think about three kinds of information:

  • Information structured and stored in ways that support business processes and analysis like traditional structured data in databases;
  • Information that can be massaged to support business processes and analysis such as clickstream data, emails, and documents that has typically been thought of as “unstructured data;" and
  • Digital assets: Information that cannot be processed or analyzed unless structured information has been attached. This includes things like movies, sound files and pictures, which have generally not been thought of as part of the “data” world at all, but which new technologies may make increasingly accessible to search

What’s the appropriate response to the rising tide of information? Standing on the edge, and trying to push it back, King Canute style, isn’t going to work. What’s more it ignores the treasures to be found in the sea of information.

The trick is finding ways of identifying and responding to the significant information, while ignoring and discarding the less valuable flotsam and jetsam. What we might call “riding the unstructured data wave.” The key skill is going to be surfing, not fishing!

In part II of this two-part series, Ian will look at the cloud, semantic and Big Data technologies that may hold the key to extracting knowledge from this sea of raw information.

As Senior Director of Product Management, Ian Rowlands is responsible for the direction of ASG’s applications, service support and metadata management technologies including ASG-MetaCMDB, ASG-Rochade and ASG-Manager Products. He has also served as vice president of ASG’s repository development organization. Prior to joining ASG, Rowlands served as director of Indirect Channels for Viasoft, a leading enterprise application management vendor that was later acquired by ASG. He was responsible for relationships with Viasoft’s distributor partners outside North America.

Tags: Cloud, big data, Semantic Technology, ASG Software, informaton and knowledge,

0 Comments (click to add your comment)
Comment and Contribute

Your comment has been submitted and is pending approval.



 (click to add your comment)

Comment and Contribute

Your name/nickname

Your email


(Maximum characters: 1200). You have characters left.