How to Extract Information from the Sea of Big Data – Part II

Jun 15, 2012

CIOUpdate Contributor

by Ian Rowlands, senior director of Product Management for ASG

Three of the current hot tech topics -- Big Data, semantic technologies and the cloud -- are shaping some of the response to the challenges posed by information overload.

Big Data technologies at play

Like unstructured data, Big Data is really another collective term. Loosely it might be defined as “the challenges and technological responses to a volume and pace of data movement which has outstripped the capabilities of conventional technologies.” For much of the last 40 years, the majority of structured data analysis (which, in practice, meant the majority of data analysis) was built on a relational database platform. Most structured data has come from what we might think of as conventional business activities -- order entry, supply management order processing and so on.

By contrast, much of the Big Data phenomenon is fuelled by sources such as traffic and climate sensors, clickstream data, GPS signals, social media sites and utility meters. There is potential for vast amounts of data. It’s important to know, however, that much of the data may well be useless!

Knowing that the weather has been perfect for the last 20 days, that a million customers are happy, and that traffic is not actually approaching critical limits may not add a lot of business value. What will add value is the ability to deal with relevance deficit disorder -- to spot the unusual, relevant and significant information. The role of analytics shifts from a focus on summarization to a focus on selection, knowing where the valuable catch is in the ocean and what information is actually relevant and useful.

Understanding meaning

Semantic technologies use meaning as their starting point; the seemingly magical ability to understand what a word means, in the context in which it is used. Semantic technologies are able to tell that the “lead” in “I took the lead in the race” is something completely different from the “lead” in “the ancient Romans used lead pipes to supply water.”

With semantic tools, meaning becomes less subjective, though it’s still a matter for negotiation. There is a wide range of semantic technologies, with different levels of sophistication and different capabilities. The way that these technologies focus on meaning makes them a natural enabler of relevance.

Many on-line providers already offer alert capabilities when keywords are detected in titles, description or metadata. Semantics go beyond simple keyword matching to identify relevance by matching concepts. It’s a shift from “what were you looking for” to “what were you thinking of.”

Beyond this, semantic technologies provide a vital bridge between data and content allowing, for instance, customer comments in social networking applications to be linked to product information and trawled for trends to correlate to business performance.

Cloud isn't a technology

cloud, it might be argued, is not so much a set of technologies (though, of course, technology is the key enabling factor) as the wholesale reworking of the way information technology is applied to business problems. The U.S. National Institute of Standards and Technology (NIST) defines cloud Computing as “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”

cloud computing is essentially a utility model for delivering IT services. At its best it is scalable on-demand, vendor agnostic, and accessible by consumers without requirement for sophisticated understanding of the infrastructure providing the facility. The common analogy -- and it’s a good one -- is with an electricity utility. In principle, it gives the consumer of computing power the ability to consume resources as and when needed, without paying for them to sit idle when not used.

Of course, without a well designed set of facilities with comprehensive reporting and accounting processes, there is an associated risk to the business as costs are incurred uncontrollably. But there is obvious value in being able to scale rapidly to deal with unstructured data volumes and to experiment with technologies delivered by cloud providers without making a major capital investment.

Finding business value in the sea of data

The three technology areas each make contributions to managing the flood of Big Data. The real value, however, emerges out of combination. One area in which new technologies may combine is in the federation of multiple clouds to provide a single information base. Leading cloud management providers have solved the problem of interconnecting clouds, so that a user’s presence can be accessed through multiple cloud services.

A logical next step will be to provide a semantic layer that allows consolidated views of topics about which information is dispersed across many clouds. There is already discussion of standards that enable rationalized access to diverse content sources, which could reasonably expand to support the cloud environment and there are already interesting applications aggregating content from social networking sites. Another interesting combination might be Big Data with cloud to provide an extension to existing archiving approaches to tiered access to information over a long period of time.

Over time, the convergence of Big Data, semantic and cloud technologies will provide an excellent framework for exploiting unstructured data. Does this mean you should dive into the water today? Maybe not dive. A toe in the water might be a better approach. The issues that come with dealing with unstructured data and exploiting new technologies overlap, and are part of the broader issues of enterprise architectures and data governance.

There is risk in the unconstrained exploitation of rapidly changing capabilities. If you already have a solid data governance policy you may already have a sense of who in your organization needs access to semantic and Big Data capabilities, and who can reasonably be allowed to self-provision cloud-based capabilities. If you don’t have data governance solidly established, you will need to seek a secure way of incubating advanced capabilities for exploiting unstructured data. The challenge lends itself naturally to a portfolio management approach, with selective use of leading edge technologies where the balance of risk and return justify the use of scarce resources.

New technologies, new opportunities

New technologies bring new opportunities. New opportunities demand new technologies. It seems the cycle is never-ending. Information management is at an early stage of a new cycle that promises to take us a step closer to the silver bullet of artificial intelligence. The convergence of cloud, semantic and Big Data technologies offers new ways to extract value from the ocean of unstructured data. We are exploring new seas, and diving into deeper depths. There are wondrous pearls to be found … as long as we take care not to be swamped by the ever larger waves!

As Senior Director of Product Management, Ian Rowlands is responsible for the direction of ASG’s applications, service support and metadata management technologies including ASG-MetaCMDB, ASG-Rochade and ASG-Manager Products. He has also served as vice president of ASG’s repository development organization. Prior to joining ASG, Rowlands served as director of Indirect Channels for Viasoft, a leading enterprise application management vendor that was later acquired by ASG. He was responsible for relationships with Viasoft’s distributor partners outside North America.

Tags: stratus, big data,

0 Comments (click to add your comment)
Comment and Contribute

Your comment has been submitted and is pending approval.



 (click to add your comment)

Comment and Contribute

Your name/nickname

Your email


(Maximum characters: 1200). You have characters left.