You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Pere Ferrera <fe...@gmail.com> on 2012/03/06 11:35:11 UTC

Pangool: Esier Hadoop, same performance

Hi,
I'd like to introduce you Pangool <http://pangool.net/>, an easier
low-level MapReduce API for Hadoop. I'm one of the developers. We just
open-sourced it yesterday.

Pangool is a Java, low-level MapReduce API with the same flexibility and
performance than the plain Java Hadoop MapReduce API. The difference is
that it makes a lot of things easier to code and understand.

A few of Pangool's features:
- Tuple-based intermediate serialization (allowing easier development).
- Built-in, easy-to-use group by and sort by (removing boilerplate code for
things like secondary sort).
- Built-in, easy-to-use reduce-side joins (which are quite hard to
implement in Hadoop).
- Augmented Hadoop API: Built-in multiple inputs / outputs, configuration
via object instance.

Pangool meets the need of making Hadoop's steep learning curve a lot
smoother while retaining all its features, power and flexibility. It
differs in high-level tools like Pig or Hive in that it can be used as a
replacement of the low-level API. There is no performance / flexibility
penalty paid for using Pangool.

We did an initial benchmark <http://pangool.net/benchmark.html> to show
this idea.

I'd be very interested in hearing your feedback, opinions and questions on
it.

Cheers,

Pere.