HP adds MapReduce-like functions to Vertica database
Direct analysis now possible from outside application
By Joab Jackson | Published: 10:43, 21 June 2011
HP's Vertica has updated its flagship parallel columnar database so other programs can instigate analysis directly on its database clusters.
For Vertica Analytic Database 5.0, HP has included an SDK (software development kit) that developers can use to have their programs make direct method calls to the Vertica database.
"We've opened up our platform so you can not only execute the SQL you want to execute, but you can write your own custom methods," said Colin Mahony, HP Vertica vice president of product and business development. "We're exposing the same environment that our developers get to work with when they add new features and functions. You get the ease of use and flexibility with SQL with the performance and extensibility of a programming language."
Initially, the SDK supports C++, though support for other languages will be added in the future.
"An SDK is an essential feature of an analytic platform," said Curt Monash of Monash Research. He said this feature is valuable because typically, in order to perform complex analysis, the data must be moved out of the database and into a data warehouse. With the SDK, programmers can have their programs probe the data directly within the database itself.
The SDK could also minimise the headache of programming for parallel environments, where data is scattered across multiple servers.The Vertica database is a grid-based column-oriented database, one developed specifically for large scale data analysis that can be carried out across a cluster of servers.
"Vertica's optimiser and execution engine automatically parallelises jobs," Mahony said. "All [developers] have to do is write their method, and we handle how its gets broken up and parallelised."
This feature could be particularly valuable in running jobs coded in MapReduce against the structured data within Vertica. MapReduce is a framework increasingly used for processing unstructured data, usually in Hadoop clusters, across multiple servers. By speaking the MapReduce language, Vertica offers users the ability to query both structured and unstructured data within a single operation.
"A lot of our customers have taken fairly large MapReduce libraries and converted them to run inside Vertica without much effort," Mahony said. "People can seamlessly move data and analytics back and forth between Vertica and Hadoop."
The new version of Vertica has a number of other improvements and enhancements as well. It has an expanded set of SQL analytic functions, including the abilities to execute basic geo-spatial queries, event-series pattern matching, event-series joins and advanced aggregate statistical and regression algorithms.
The company has also tightened the core code base to run faster subqueries, database statistics and other routine database operations. The backup capabilities have been expanded. And this release features cluster-cloning capabilities, or the ability to break off a piece of the database and run it in its own sandboxed environment.
"So if you have a 200 node cluster and want to spin off a sandbox for a separate data mart, you can point Vertica at the new server cluster, hit a button and it will automatically ship the data off" to this new cluster, he said.
The new release "plays to the strengths that Vertica already has in terms of scalability, parallelism and rapid provisioning," said James Kobielus, an IT industry analyst focusing on data warehousing for Forrester Research. Forrester has predicted that by the end of this year, "pretty much every data warehouse vendor will have native support for the MapReduce APIs (application programming interfaces) within their core products," he said, noting that Teradata's Aster Data and EMC's Greenplum units both offer MapReduce support.
"Everyone is going down these lines fairly quickly," he said. Because MapReduce is an open framework, an organisation could in theory build a MapReduce model for one data warehouse and have it run on another data warehouse from a different vendor.
The Vertica platform was the flagship product of Vertica, which was started in 2005. HP acquired Vertica earlier this year, in an effort to expand its analytics portfolio.
The software is available either as a download, or packaged in an appliance. Pricing will be based on the amount of data analysed.