You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Pere Ferrera Bertran (JIRA)" <ji...@apache.org> on 2012/12/13 18:26:12 UTC

[jira] [Updated] (MAPREDUCE-4876) Adopt a Tuple MapReduce API instead of classic MapReduce one

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pere Ferrera Bertran updated MAPREDUCE-4876:
--------------------------------------------

    Description: 
After using MapReduce for many years, we have noticed that it lacks some important features: compound records, easy intra-reduce sorting and join capabilities. We have elaborated a slightly modified MapReduce foundation to overcome these problems: Tuple MapReduce. You can see a full paper published at the ICDM 2012 that describes it at http://pangool.net/TupleMapReduce.pdf 

The good news are:
1) That no architectural changes on Hadoop are needed to embrace Tuple MapReduce.
2) Indeed, we have proven that it is possible to implement it on top of Hadoop. See the Pangool Open Source project ( http://pangool.net/ ). 
3) It performs very efficiently ( http://pangool.net/benchmark.html )
4) It is compatible with all Hadoop stack: Writables, Serializers, Input/OutputFormats, etc. 

We believe Hadoop community could benefit from it in different ways:
1) By getting ideas for a future API redesign.
2) By adopting Pangool inside Hadoop. Of course, we would be helping and contributing with anything needed doing any adaptation changes if needed (not many, because as I told, everything is compatible with existing MapReduce).

Obviously, we prefer the second. But at least, we believe some good ideas can be obtained by looking at Tuple MapReduce and Pangool.  

There are also other improvements in Pangool that would improve Hadoop API:
1) Configuration by instance: passing parameters by constructor. For example, Pangool Input/OutputFormats can be configured by providing values to the constructor.
2) Stateful serialization. What is requested in  https://issues.apache.org/jira/browse/MAPREDUCE-1462 is already supported by Pangool.
3) First-class multipleinput/multipleoutput.

Well, we are open to the discussion and to contribute.

  was:
After using MapReduce for many years, we have notice that it lacks some important features: compound records, easy intra-reduce sorting and join capabilities. We have elaborated slightly modified MapReduce foundation to overcome these problems: Tuple MapReduce. You can see a full paper published at the ICDM 2012 that describes it at http://pangool.net/TupleMapReduce.pdf 

The good news are:
1) That it is not needed any architectural change on Hadoop to embrace Tuple MapReduce
2) Indeed, we have proven that it is possible to implement it on top of Hadoop. See the Pangool Open Source project ( http://pangool.net/ ). 
3) It performs very efficiently ( http://pangool.net/benchmark.html )
4) It is compatible with all Hadoop stack: Writables, Serializers, Input/OutputFormats, etc. 

We believe Hadoop community could benefits from it in different ways:
1) By getting ideas for future API redesign
2) By adopting Pangool inside Hadoop. Of course, we would be helping and contributing with anything needed, including by doing any adaptation changes needed (too few, because as I told, everything is compatible with existing MapReduce)

Obviously, we prefer the second. But at least, we believe some good ideas can be obtained by looking at Tuple MapReduce and Pangool.  

There are also other improvements in Pangool that would improve Hadoop API:
1) Configuration by instance: passing parameters by constructor. For example, Pangool Input/OutputFormats can be configured by providing values to the constructor
2) Stateful serialization. What is requested in  https://issues.apache.org/jira/browse/MAPREDUCE-1462 is supported by Pangool
3) First-class multipleinput/multipleoutput

Well, we are open to the discussion and to contribute.

    
> Adopt a Tuple MapReduce API instead of classic MapReduce one
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-4876
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4876
>             Project: Hadoop Map/Reduce
>          Issue Type: Wish
>            Reporter: Iván de Prado
>            Priority: Minor
>
> After using MapReduce for many years, we have noticed that it lacks some important features: compound records, easy intra-reduce sorting and join capabilities. We have elaborated a slightly modified MapReduce foundation to overcome these problems: Tuple MapReduce. You can see a full paper published at the ICDM 2012 that describes it at http://pangool.net/TupleMapReduce.pdf 
> The good news are:
> 1) That no architectural changes on Hadoop are needed to embrace Tuple MapReduce.
> 2) Indeed, we have proven that it is possible to implement it on top of Hadoop. See the Pangool Open Source project ( http://pangool.net/ ). 
> 3) It performs very efficiently ( http://pangool.net/benchmark.html )
> 4) It is compatible with all Hadoop stack: Writables, Serializers, Input/OutputFormats, etc. 
> We believe Hadoop community could benefit from it in different ways:
> 1) By getting ideas for a future API redesign.
> 2) By adopting Pangool inside Hadoop. Of course, we would be helping and contributing with anything needed doing any adaptation changes if needed (not many, because as I told, everything is compatible with existing MapReduce).
> Obviously, we prefer the second. But at least, we believe some good ideas can be obtained by looking at Tuple MapReduce and Pangool.  
> There are also other improvements in Pangool that would improve Hadoop API:
> 1) Configuration by instance: passing parameters by constructor. For example, Pangool Input/OutputFormats can be configured by providing values to the constructor.
> 2) Stateful serialization. What is requested in  https://issues.apache.org/jira/browse/MAPREDUCE-1462 is already supported by Pangool.
> 3) First-class multipleinput/multipleoutput.
> Well, we are open to the discussion and to contribute.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira