Follow Us

Microsoft rejects Hadoop for databases

Redmond sticking with SQL

The three leaders of the relational database market are responding to the sudden mania for the data processing technology Hadoop in three very different ways.

While startups and established data warehousing vendors such as Sybase and Teradata are embracing Hadoop and its Google-developed progenitor, MapReduce, Microsoft is resisting it.

"We'd never bring Hadoop code into one of our products," said Microsoft technical fellow and University of Wisconsin-Madison professor David J. DeWitt.

DeWitt's lack of interest is not surprising. DeWitt is an academic expert in parallel SQL databases, having co-invented three of them. He co-authored a paper this spring that argued that SQL databases still beat MapReduce at most tasks. He hasn't changed his mind.

"Every database vendor wants to claim that they're doing Hadoop because it's the popular thing," he said. "There's too much FUD. SQL databases still work pretty well." DeWitt leads a database research lab at Madison that is helping Microsoft with R&D for its upcoming Parallel Data Warehousing version of SQL Server 2008 R2, formerly known as Project Madison.

As such, he said that the new edition of SQL Server will add some analytic functions that roughly mimic some of the features of MapReduce/Hadoop. The additions are the result of incorporating technology from DATAllegro which Microsoft acquired, not Hadoop, DeWitt said.

He said does acknowledge, however, that MapReduce/Hadoop is better at keeping long running queries from crashing than SQL. Because of that, Microsoft may eventually try to incorporate those capabilities into future data warehousing-oriented versions of SQL Server, he said.

That would likely be a Microsoft-led effort, rather than a licensing of Hadoop's open source code, which is managed by the Apache Software Foundation.

IBM is the leading corporate supporter of Apache. Perhaps unsurprisingly, it is also "very bullish on Hadoop," said Anant Jhingran, CTO of IBM's information management division in the software group. "I'm not saying that mind-melding Hadoop with a database is the answer for everything," Jhingran said. "But in the end, I think every enterprise will want Hadoop. I'm just not sure in what form."

Questions remain about whether enterprises want Hadoop integrated into their SQL databases, as a separate data warehousing appliance, or as a web-only service where Hadoop is hidden underneath, as with IBM's experimental M2 service.

To determine this, IBM is running pilots with a dozen enterprise customers, as well as doing R&D work in the lab, Jhingran said. He declined to comment on the likelihood of Hadoop functionality making it into the next version of DB2 or Informix.

One thing is for certain, says Jhingran: Hadoop is best used to solve emerging problems such as web analytics, fraud, and analysis of unstructured and semi-structured data, rather than the problems that relational databases have already proven to excel with.

"For those vendors who simply want to use Hadoop to build a database replacement, I think they will fall flat on their faces," he said. SQL technology "supports a $300 billion ecosystem. It's extremely robust. I'm not that young [at 46], but I'll be retired before SQL is retired."

Oracle Database stands to lose the most if MapReduce/Hadoop takes off, critics say.

That's not just because of Oracle's longtime lead in the relational database market, but also because of its database's poor reputation for scaleout, a MapReduce/Hadoop strength.

Oracle did not respond to a request for comment. But in October, it published a blog which argued, in the words of independent analyst Curt Monash, that "actually, we've been doing MapReduce all along."

A senior product manager at Oracle, Jean-Pierre Dijcks, said parallel processing of large data sets been possible with Oracle Database using features first introduced with Oracle 9i back in 2001. He describes in detail how to implement it in a blog post.

"MapReduce in the end is a programming construct... SQL will allow for massive parallel processing as well. It is all a matter of looking beyond hype and finding a solution you are comfortable with," Dijcks wrote.






Send to a friend

Email this article to a friend or colleague:

PLEASE NOTE: Your name is used only to let the recipient know who sent the story, and in case of transmission error. Both your name and the recipient's name and address will not be used for any other purpose.

Techworld White Papers

State of software security report volume 4

If your business has anything worth protecting, be it money, intellectual property or a trusted...

Download Whitepaper

New threats demand innovative responses

Financial institutions in the UK remain susceptible to further systemic problems, as challenging...

Download Whitepaper

Delivering a competitive advantage through IT

IT organisations share a common mission; to optimise investments and streamline operations to...

Download Whitepaper

6 tips to mobilise your existing ERP

Enterprise mobile users throughout the global business community will number 1.19 billion by...

Download Whitepaper

Techworld UK - Technology - Business

Techworld Awards

Techworld Awards Winners 2011


Learn who the winners of this year's Techworld Awards are. Video footage coming soon...

Find out more
Techworld Mobile Site

Access Techworld's content on the move

Get the latest news, product reviews and downloads on your mobile device with Techworld's mobile site.

Find out more...

Site Map

* *