Follow Us

We use cookies to provide you with a better experience. If you continue to use this site, we'll assume you're happy with this. Alternatively, click here to find out how to manage these cookies

hide cookie message

Apache aims Hadoop update at larger distributed clusters

Ten thousands concurrent jobs is the goal for 0.23 release

Article comments

With a planned upgrade to its Hadoop distributed data processing technology, the Apache Software Foundation intends for the platform to run across much larger clusters and take on larger workloads, an Apache official said.

A key goal for the upcoming 0.23 release of Hadoop, which could eventually be called version two or three, is to have it run across 6,000-node clusters. It currently has run on 4,000-node clusters, said Arun Murthy, vice president of Apache Hadoop and a founder of Hortonworks, which offers Hadoop technologies and services. Release 0.23 is currently alpha quality, it is due for more formal release later this year.

Hadoop has become popular for mining large data sets. Plans call for Hadoop 0.23 to run across 6,000-machine clusters, each with 16 or more cores, and process 10,000 concurrent jobs. Users will get more work done, Murthy said in a presentation at the O'Reilly Strata conference. Performance, he stressed, is something users "can never have enough of".

Other capabilities eyed for the upgrade include HDFS (Hadoop Distributed File System) federation as well as high availability for HDFS. MapReduce, which is the programming model and software framework in Hadoop, will be improved as well. Called Yarn, the MapReduce upgrade "is the first to take Hadoop and make it a much more general data processing system," Murthy said.

Yarn is "a high performance rewrite of MapReduce," with twice the throughput on large clusters, said Eric Baldeschwieler, Hortonworks CTO. Also, wire protocol compatibility planned for the 0.23 release will enable server and client upgrades to be done independently.

Also at Strata on Thursday, MarkLogic and Hortonworks announced integration between Hortonworks Data Platform and MarkLogic's operational database platform.

The integration will allow users to combine MapReduce with MarkLogic's real-time interactive analysis and indexing on a single, unified platform, MarkLogic said. MarkLogic will certify its Connector for Hadoop against Hortonworks Data Platform.


More from Techworld

More relevant IT news


Send to a friend

Email this article to a friend or colleague:

PLEASE NOTE: Your name is used only to let the recipient know who sent the story, and in case of transmission error. Both your name and the recipient's name and address will not be used for any other purpose.

Techworld White Papers

Choose – and Choose Wisely – the Right MSP for Your SMB

End users need a technology partner that provides transparency, enables productivity, delivers...

Download Whitepaper

10 Effective Habits of Indispensable IT Departments

It’s no secret that responsibilities are growing while budgets continue to shrink. Download this...

Download Whitepaper

Gartner Magic Quadrant for Enterprise Information Archiving

Enterprise information archiving is contributing to organisational needs for e-discovery and...

Download Whitepaper

Advancing the state of virtualised backups

Dell Software’s vRanger is a veteran of the virtualisation specific backup market. It was the...

Download Whitepaper

Techworld UK - Technology - Business

Innovation, productivity, agility and profit

Watch this on demand webinar which explores IT innovation, managed print services and business agility.

Techworld Mobile Site

Access Techworld's content on the move

Get the latest news, product reviews and downloads on your mobile device with Techworld's mobile site.

Find out more...

From Wow to How : Making mobile and cloud work for you

On demand Biztech Briefing - Learn how to effectively deliver mobile work styles and cloud services together.

Watch now...

Site Map

* *