You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Rahul Jain (JIRA)" <ji...@apache.org> on 2012/08/16 01:34:38 UTC

[jira] [Created] (MAPREDUCE-4560) Job can get stuck in a deadlock between mappers and reducers for low values of mapreduce.job.reduce.slowstart.completedmaps (<<1)

Rahul Jain created MAPREDUCE-4560:
-------------------------------------

Summary: Job can get stuck in a deadlock between mappers and reducers for low values of mapreduce.job.reduce.slowstart.completedmaps (<<1)
Key: MAPREDUCE-4560
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4560
Project: Hadoop Map/Reduce
Issue Type: Bug
Reporter: Rahul Jain
Fix For: 2.0.0-alpha

This issue has been seen with MapReduceV2, never with MapReduceV1 in our lab systems.

The parameter mapreduce.job.reduce.slowstart.completedmaps=0.05 (the default value).

We found Application master stuck in a deadlock between mappers and reducers with no progress in the job; the sequence appears to be:

1. Initial available map/reduce slots were allocated to mappers
2. Once mappers made progress and few of them completed, reducers started occupying few of the slots due to low values of above config param.
3. The scheduler appears to not give priority to mappers over reducers; after a while in our system we saw all slots occupied by reducers.
4. Since there were still mapper tasks not yet assigned any slot, the map phase never completed.
5. The system entered a deadlock state where reducers occupy all available slots, but are waiting for mappers to be complete; mappers cannot move forward because of no slot available.

The workaround in our system was to set
mapreduce.job.reduce.slowstart.completedmaps=1 and the issue was no longer seen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4560) Job can get stuck in a deadlock between mappers and reducers for low values of mapreduce.job.reduce.slowstart.completedmaps (<<1)

Posted by "Rahul Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469511#comment-13469511 ] 

Rahul Jain commented on MAPREDUCE-4560:
---------------------------------------

Yes, this issue was found in FIFO scheduler; we can mark it duplicate of MAPREDUCE-4299 once we verify that fix does resolve the issue.
                
> Job can get stuck in a deadlock between mappers and reducers for low values of mapreduce.job.reduce.slowstart.completedmaps (<<1)
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4560
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4560
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Rahul Jain
>             Fix For: 2.0.0-alpha
>
>
> This issue has been seen with MapReduceV2, never with MapReduceV1 in our lab systems.
> The parameter mapreduce.job.reduce.slowstart.completedmaps=0.05 (the default value).
> We found Application master stuck in a deadlock between mappers and reducers with no progress in the job; the sequence appears to be:
> 1. Initial available map/reduce slots were allocated to mappers
> 2. Once mappers made progress and few of them completed, reducers started occupying few of the slots due to low values of above config param.
> 3. The scheduler appears to not give priority to mappers over reducers; after a while in our system we saw all slots occupied by reducers.
> 4. Since there were still mapper tasks not yet assigned any slot, the map phase never completed.
> 5. The system entered a deadlock state where reducers occupy all available slots, but are waiting for mappers to be complete; mappers cannot move forward because of no slot available.
> The workaround in our system was to set 
> mapreduce.job.reduce.slowstart.completedmaps=1 and the issue was no longer seen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4560) Job can get stuck in a deadlock between mappers and reducers for low values of mapreduce.job.reduce.slowstart.completedmaps (<<1)

Posted by "nemon lou (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435848#comment-13435848 ] 

nemon lou commented on MAPREDUCE-4560:
--------------------------------------

Do you use the FIFO scheduler?
If so ,have a look at MAPREDUCE-4299
                
> Job can get stuck in a deadlock between mappers and reducers for low values of mapreduce.job.reduce.slowstart.completedmaps (<<1)
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4560
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4560
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Rahul Jain
>             Fix For: 2.0.0-alpha
>
>
> This issue has been seen with MapReduceV2, never with MapReduceV1 in our lab systems.
> The parameter mapreduce.job.reduce.slowstart.completedmaps=0.05 (the default value).
> We found Application master stuck in a deadlock between mappers and reducers with no progress in the job; the sequence appears to be:
> 1. Initial available map/reduce slots were allocated to mappers
> 2. Once mappers made progress and few of them completed, reducers started occupying few of the slots due to low values of above config param.
> 3. The scheduler appears to not give priority to mappers over reducers; after a while in our system we saw all slots occupied by reducers.
> 4. Since there were still mapper tasks not yet assigned any slot, the map phase never completed.
> 5. The system entered a deadlock state where reducers occupy all available slots, but are waiting for mappers to be complete; mappers cannot move forward because of no slot available.
> The workaround in our system was to set 
> mapreduce.job.reduce.slowstart.completedmaps=1 and the issue was no longer seen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira