You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Johannes Zillmann (JIRA)" <ji...@apache.org> on 2012/09/04 11:50:09 UTC

[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

    [ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447584#comment-13447584 ] 

Johannes Zillmann commented on MAPREDUCE-207:
---------------------------------------------

Currently in our hadoop applications we calculate the splits before we submit it to the client (then the client simply looks up the existing splits). We do that mainly to influence the reducer count base on the number of splits/map-tasks.
In case hadoop does the splitting on the cluster (which makes sense), it would be nice to have a hook to influence configuration!
Sometimes it also makes sense for us to decide on the map-reduce assembly after we know the splits (different join strategies for different data constellations).

Just dumping some ideas here...

                
> Computing Input Splits on the MR Cluster
> ----------------------------------------
>
>                 Key: MAPREDUCE-207
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: applicationmaster, mrv2
>            Reporter: Philip Zeyliger
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-207.patch
>
>
> Instead of computing the input splits as part of job submission, Hadoop could have a separate "job task type" that computes the input splits, therefore allowing that computation to happen on the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira