You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Peng Zhang (JIRA)" <ji...@apache.org> on 2015/09/16 09:47:45 UTC

[jira] [Resolved] (MAPREDUCE-6445) Shuffle hang

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peng Zhang resolved MAPREDUCE-6445.
-----------------------------------
    Resolution: Cannot Reproduce

> Shuffle hang
> ------------
>
>                 Key: MAPREDUCE-6445
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6445
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Peng Zhang
>
> Scale cluster has run for months with 2.6.0.
> 2 of 200 reduces hang on shuffle
> instance 1 log seems like loop on 1 map output:
> {noformat}
> 2015-08-06 21:54:14,649 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-132.bj:22408 to fetcher#1
> 2015-08-06 21:54:14,651 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=20&map=attempt_1438689528746_10193_m_000013_0,attempt_1438689528746_10193_m_000020_0 sent hash and received reply
> 2015-08-06 21:54:14,651 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 - MergeManager returned status WAIT ...
> 2015-08-06 21:54:14,651 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-132.bj:22408 freed by fetcher#1 in 2ms
> 2015-08-06 21:54:14,651 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-132.bj:22408 with 2 to fetcher#5
> 2015-08-06 21:54:14,651 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-132.bj:22408 to fetcher#5
> 2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=20&map=attempt_1438689528746_10193_m_000013_0,attempt_1438689528746_10193_m_000020_0 sent hash and received reply
> 2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#5 - MergeManager returned status WAIT ...
> 2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-132.bj:22408 freed by fetcher#5 in 4ms
> 2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-132.bj:22408 with 2 to fetcher#5
> 2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-132.bj:22408 to fetcher#5
> 2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=20&map=attempt_1438689528746_10193_m_000013_0,attempt_1438689528746_10193_m_000020_0 sent hash and received reply
> 2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#5 - MergeManager returned status WAIT ...
> 2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-132.bj:22408 freed by fetcher#5 in 5ms
> 2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-132.bj:22408 with 2 to fetcher#5
> 2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-132.bj:22408 to fetcher#5
> {noformat}
> node 2 log seems like loop on 5 map output:
> {noformat}
> 2015-08-06 21:43:33,626 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-172.bj:22408 with 1 to fetcher#5
> 2015-08-06 21:43:33,626 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to node-172.bj:22408 to fetcher#5
> 2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=85&map=attempt_1438689528746_10193_m_000013_0,attempt_1438689528746_10193_m_000020_0 sent hash and received reply
> 2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#3 - MergeManager returned status WAIT ...
> 2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-132.bj:22408 freed by fetcher#3 in 5ms
> 2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-179.bj:22408 with 1 to fetcher#3
> 2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to node-179.bj:22408 to fetcher#3
> 2015-08-06 21:43:33,627 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=85&map=attempt_1438689528746_10193_m_000084_0,attempt_1438689528746_10193_m_000046_0 sent hash and received reply
> 2015-08-06 21:43:33,627 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#4 - MergeManager returned status WAIT ...
> 2015-08-06 21:43:33,627 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-71.bj:22408 freed by fetcher#4 in 5ms
> 2015-08-06 21:43:33,627 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-71.bj:22408 with 2 to fetcher#4
> 2015-08-06 21:43:33,627 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-71.bj:22408 to fetcher#4
> 2015-08-06 21:43:33,628 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=85&map=attempt_1438689528746_10193_m_000092_0 sent hash and received reply
> 2015-08-06 21:43:33,628 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#2 - MergeManager returned status WAIT ...
> 2015-08-06 21:43:33,628 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-167.bj:22408 freed by fetcher#2 in 3ms
> 2015-08-06 21:43:33,628 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-132.bj:22408 with 2 to fetcher#2
> 2015-08-06 21:43:33,628 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-132.bj:22408 to fetcher#2
> 2015-08-06 21:43:33,629 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=85&map=attempt_1438689528746_10193_m_000097_0 sent hash and received reply
> 2015-08-06 21:43:33,629 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 - MergeManager returned status WAIT ...
> 2015-08-06 21:43:33,629 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-174.bj:22408 freed by fetcher#1 in 3ms
> 2015-08-06 21:43:33,629 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-174.bj:22408 with 1 to fetcher#1
> 2015-08-06 21:43:33,629 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to node-174.bj:22408 to fetcher#1
> 2015-08-06 21:43:33,629 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=85&map=attempt_1438689528746_10193_m_000093_0 sent hash and received reply
> 2015-08-06 21:43:33,629 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#5 - MergeManager returned status WAIT ...
> 2015-08-06 21:43:33,630 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-172.bj:22408 freed by fetcher#5 in 3ms
> 2015-08-06 21:43:33,630 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-172.bj:22408 with 1 to fetcher#5
> 2015-08-06 21:43:33,630 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to node-172.bj:22408 to fetcher#5
> 2015-08-06 21:43:33,630 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=85&map=attempt_1438689528746_10193_m_000089_0 sent hash and received reply
> 2015-08-06 21:43:33,630 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#3 - MergeManager returned status WAIT ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)