Making Sense of Enterprise Search

By Jeff Vance

(Back to article)

Why is it so much easier to find things on the web than within corporate applications? With the web being so vast and unruly, shouldn’t the opposite be true?

Not necessarily. The web has a few things working in its favor, such as the knowledge of crowds, the motivations of marketers, and the evolution of consumer-facing search engines that has culminated in Google (www.google.com).

Corporate data, on the other hand, has just as many factors working against it, such as proprietary applications, security concerns, and the fact that until recently most enterprise search platforms were prohibitively expensive for all but the largest organizations.

For many mid-tier organizations, the biggest obstacle to a workable search strategy is the last item in that list: price. According to Timothy Hickernell, a senior research analyst with Info-Tech Research Group, this is starting to change. Competition among the major vendors is driving down prices, while at the same time new vendors seeking to crack this market are basing their search technology on open-source search projects like Lucene. The end result is even more downward pressure on cost.

Even if the price is right, though, there will still be the struggle to make search work across disparate data sets and applications. “There’s an entrenched misconception in the enterprise that corporate information is chaotic,” said Matt Eichner, VP of Strategic Development and Strategic Marketing at enterprise search vendor Endeca. “If it doesn’t conform to my needs, it must be a mess.”

Getting to Authoritative Data

However, Eichner argues, there is a lot of information in any given data repository. While the web may have links and crowd wisdom, corporate data has vastly more inherent authority than what is on the web, and it often carries with it information from a parent application.

“Take email, for instance,” Eichner said. “It’s unstructured, but you have a bunch of information at your disposal – subject lines, attachments, the sender, and even domain information.” He noted that this kind of information, or metadata, will vary from search to search. “If you’re doing a financial audit, for instance, you might look for specific transactions. An organization that requires approval over a certain financial threshold, such as $10,000, will want to flag any $9,999 transactions.” In this case, organizational policy adds meaning to a specific set of data.

Most end users, however, won’t care much about audit trails, which is something more specific to legal tangles associated with e-discovery. What end users do care about, though, is having an enterprise search platform that can help them with their day-to-day workflows. However, those workflows, and the data requirements associated with them, vary greatly from role to role.One approach to this problem is customization. “Some search vendors have decided to give customers a toolkit that allows them to create their own customized search experiences,” said Matt Glozbach, product management director for the enterprise for Google. The idea is that search can match anyone’s needs, so long as you can figure out how to tweak the algorithms.

“The trouble is that searches are tricky and tuning and tweaking algorithms takes a massive amount of effort and a lot of know how,” Glozbach added. “Search isn’t a set-it-and-forget-it tool. Data changes over time, as do end users’ needs.”

What needs to change isn’t the algorithm, Glozbach and other search experts argue, but rather how information is gathered and presented. “Inside the firewall, it’s important to resist the one-size-fits-all search engine,” Hickernell said. He advises organizations to scale down search strategies, applying them to specific departmental or process needs. “A search box can be tied to a narrow topic. If an HR employee is searching for information, there is no reason they should see press releases in the results.”

In other words, a better approach than mucking around with algorithms is to simply give certain departments or users the ability to promote one set of data over other sets. The relevancy is tied to the data store.

This raises another set of problems, though, mainly that if an organization customizes search too much, search will become a complex tangle demanding a lot of IT attention. “One of the disservices enterprise search vendors have done is distance enterprise search from web search,” Google’s Glozbach said. “While certain parameters will differ in the enterprise, the user experience should be the same. Other vendors say that within the enterprise search can be slower, or more complex. We disagree. A simple interface that returns fast and useful results should be a given.”

Eichner at Endeca agrees that a simple interface is a must, but he also believes that there is one important place where enterprise search should differ from the web: the displaying of results. “Enterprise users need more than a few random lines telling them what the data represents. Summaries are a better way to return results, giving users a fuller understanding of the data.”Beyond Basic Search

A summary, though, is just a start. With large data repositories, it’s important to organize the results in a meaningful way. Endeca has worked with university libraries, a situation that gives them the advantage of having high organization already in place. Results will come back in categories, such as history or fiction, giving a fine-grained view of the information.

The typical enterprise doesn’t have a card catalog at its disposal, however. Search vendors realize this, and as they come up with methods for adding organization to various data stores, they are realizing that search is simply a starting point to the larger issue of “information access.”

In its “Magic Quadrant” series, Gartner says that Endeca, Autonomy (www.autonomy.com), and Fast Search & Transfer (www.fastsearch.com) are the leaders defining this space. Google, Microsoft (www.microsoft.com), and IBM (www.ibm.com) can’t be discounted, though. All have deep pockets and proven track records.

What, though, is the difference between search and information access? Basic search is really little more than hunting for keywords, whereas information access helps organizations find, store, organize, classify, present, and share data – and since these are enterprise settings, not the public Internet, having hooks into authentication and other security systems are a given.

Information access vendors are also seeking better ways to represent information once it is retrieved. This could involve summaries, as with Endeca, tabbed searches of narrow applications and data stores, which Google already offers to enterprise customers, or even visual representations that show how data is linked across applications.

“When we succeed in making search as valuable to business as it is to the web, we’ll change how people work,” said Glozbach. “You’ll see search driving data sharing, collaboration, and any number of new processes and technologies.”