You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Harsh J (JIRA)" <ji...@apache.org> on 2011/06/19 09:29:47 UTC

[jira] [Commented] (MAPREDUCE-318) Refactor reduce shuffle code

    [ https://issues.apache.org/jira/browse/MAPREDUCE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051653#comment-13051653 ] 

Harsh J commented on MAPREDUCE-318:
-----------------------------------

A question about the MAX_MAPS_AT_ONCE limit chosen to 20. Would anyone have an answer to this?

On Sun, Jun 19, 2011 at 4:01 AM, Shrinivas Joshi <jshrinivasatttttgmaildottttttcom> wrote:
> We see following type of lines in our reducer log files. Based on my
> understanding it looks like the target map host has 53 map outputs that are
> ready to be fetched. The shuffle scheduler seems to be allowing only 20 of
> them to be fetched at a time. This is controlled by MAX_MAPS_AT_ONCE
> variable in ShuffleScheduler class. Is my understanding of this log output
> correct? If so, why is MAX_MAPS_AT_ONCE set to 20?
>
> Thanks for your time.
>
> -Shrinivas
>
> INFO org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
> Assiging hostname:50060 with 53 to fetcher#16
> INFO org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
> assigned 20 of 53 to hostname:50060 to fetcher#16

> Refactor reduce shuffle code
> ----------------------------
>
>                 Key: MAPREDUCE-318
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-318
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5233_api.patch, HADOOP-5233_part0.patch, mapred-318-14Aug.patch, mapred-318-20Aug.patch, mapred-318-24Aug.patch, mapred-318-3Sep-v1.patch, mapred-318-3Sep.patch, mapred-318-common.patch
>
>
> The reduce shuffle code has become very complex and entangled. I think we should move it out of ReduceTask and into a separate package (org.apache.hadoop.mapred.task.reduce). Details to follow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira