You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Peng Zhang (JIRA)" <ji...@apache.org> on 2015/08/06 15:58:05 UTC

[jira] [Created] (MAPREDUCE-6445) Shuffle hang

Peng Zhang created MAPREDUCE-6445:
-------------------------------------

             Summary: Shuffle hang
                 Key: MAPREDUCE-6445
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6445
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: Peng Zhang


Scale cluster has run for months with 2.6.0.
2 of 200 reduces hang on shuffle

instance 1 log seems like loop on 1 map output:
{noformat}
2015-08-06 21:54:14,649 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-132.bj:22408 to fetcher#1
2015-08-06 21:54:14,651 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=20&map=attempt_1438689528746_10193_m_000013_0,attempt_1438689528746_10193_m_000020_0 sent hash and received reply
2015-08-06 21:54:14,651 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 - MergeManager returned status WAIT ...
2015-08-06 21:54:14,651 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-132.bj:22408 freed by fetcher#1 in 2ms
2015-08-06 21:54:14,651 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-132.bj:22408 with 2 to fetcher#5
2015-08-06 21:54:14,651 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-132.bj:22408 to fetcher#5
2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=20&map=attempt_1438689528746_10193_m_000013_0,attempt_1438689528746_10193_m_000020_0 sent hash and received reply
2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#5 - MergeManager returned status WAIT ...
2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-132.bj:22408 freed by fetcher#5 in 4ms
2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-132.bj:22408 with 2 to fetcher#5
2015-08-06 21:54:14,656 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-132.bj:22408 to fetcher#5
2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=20&map=attempt_1438689528746_10193_m_000013_0,attempt_1438689528746_10193_m_000020_0 sent hash and received reply
2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#5 - MergeManager returned status WAIT ...
2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-132.bj:22408 freed by fetcher#5 in 5ms
2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-132.bj:22408 with 2 to fetcher#5
2015-08-06 21:54:14,660 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-132.bj:22408 to fetcher#5
{noformat}

node 2 log seems like loop on 5 map output:
{noformat}
2015-08-06 21:43:33,626 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-172.bj:22408 with 1 to fetcher#5
2015-08-06 21:43:33,626 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to node-172.bj:22408 to fetcher#5
2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=85&map=attempt_1438689528746_10193_m_000013_0,attempt_1438689528746_10193_m_000020_0 sent hash and received reply
2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#3 - MergeManager returned status WAIT ...
2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-132.bj:22408 freed by fetcher#3 in 5ms
2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-179.bj:22408 with 1 to fetcher#3
2015-08-06 21:43:33,627 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to node-179.bj:22408 to fetcher#3
2015-08-06 21:43:33,627 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=85&map=attempt_1438689528746_10193_m_000084_0,attempt_1438689528746_10193_m_000046_0 sent hash and received reply
2015-08-06 21:43:33,627 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#4 - MergeManager returned status WAIT ...
2015-08-06 21:43:33,627 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-71.bj:22408 freed by fetcher#4 in 5ms
2015-08-06 21:43:33,627 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-71.bj:22408 with 2 to fetcher#4
2015-08-06 21:43:33,627 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-71.bj:22408 to fetcher#4
2015-08-06 21:43:33,628 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=85&map=attempt_1438689528746_10193_m_000092_0 sent hash and received reply
2015-08-06 21:43:33,628 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#2 - MergeManager returned status WAIT ...
2015-08-06 21:43:33,628 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-167.bj:22408 freed by fetcher#2 in 3ms
2015-08-06 21:43:33,628 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-132.bj:22408 with 2 to fetcher#2
2015-08-06 21:43:33,628 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 2 of 2 to node-132.bj:22408 to fetcher#2
2015-08-06 21:43:33,629 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=85&map=attempt_1438689528746_10193_m_000097_0 sent hash and received reply
2015-08-06 21:43:33,629 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 - MergeManager returned status WAIT ...
2015-08-06 21:43:33,629 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-174.bj:22408 freed by fetcher#1 in 3ms
2015-08-06 21:43:33,629 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-174.bj:22408 with 1 to fetcher#1
2015-08-06 21:43:33,629 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to node-174.bj:22408 to fetcher#1
2015-08-06 21:43:33,629 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=85&map=attempt_1438689528746_10193_m_000093_0 sent hash and received reply
2015-08-06 21:43:33,629 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#5 - MergeManager returned status WAIT ...
2015-08-06 21:43:33,630 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: node-172.bj:22408 freed by fetcher#5 in 3ms
2015-08-06 21:43:33,630 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning node-172.bj:22408 with 1 to fetcher#5
2015-08-06 21:43:33,630 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to node-172.bj:22408 to fetcher#5
2015-08-06 21:43:33,630 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=22408/mapOutput?job=job_1438689528746_10193&reduce=85&map=attempt_1438689528746_10193_m_000089_0 sent hash and received reply
2015-08-06 21:43:33,630 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#3 - MergeManager returned status WAIT ...
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)