You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ravi Prakash (JIRA)" <ji...@apache.org> on 2013/03/06 00:52:14 UTC

[jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594110#comment-13594110 ] 

Ravi Prakash commented on MAPREDUCE-4842:
-----------------------------------------

Hi Mariappan,

bq. This is a tangent to point 1. The mergeFactor is set to the configured value for IntermediateMemoryToMemoryMerger but to Integer.MAX_VALUE for InMemoryMerger and OnDiskMerger. We have to find out the rationale behind these choices.

Thanks for all your work on the MergeManager. It is soooooo much cleaner now! Thanks much.

Anyway, since you have been in this area of the code, I was wondering if you could please review MAPREDUCE-3685? The mergeFactor for the OnDiskMerger was wrong. For inMemoryMerger it seems to be correct (because io.sort.factor is defined as "The number of streams to merge at once while sorting files. This determines the number of open file handles."). Besides I wonder if we want to really go into the level of detail of the number of fetched cache lines and not just simplify by assuming constant access to all memory. Please consider continuing the discussion there.

Thanks


                
> Shuffle race can hang reducer
> -----------------------------
>
>                 Key: MAPREDUCE-4842
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Mariappan Asokan
>            Priority: Blocker
>             Fix For: 2.0.3-alpha, 0.23.6
>
>         Attachments: MAPREDUCE-4842-2.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, mapreduce-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, MAPREDUCE-4842.patch
>
>
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  It looked similar to the problem described in MAPREDUCE-3721, where the fetchers were all being told to WAIT by the MergeManager but no merge was taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira